<p style="align:left;">
    <img src="https://upload.wikimedia.org/wikipedia/commons/6/64/Rome_banner_panorama.jpg" style="width:100%;"><br>
    <span style="font-size:80%;">Source: <a href="https://it.wikivoyage.org/wiki/Roma">https://it.wikivoyage.org/wiki/Roma</a></span>
</p>

<h1 align=center>The Battle of Neighborhoods</h1>
<h2 align=center>Data</h2>

<h3>Rome's neighborhoods (Municipi)</h3>

For this project, I need data information about Rome's neighborhoods (called Municipi). Data about neighborhoods are available in the Italian web portal of Open Data (<a href="http://www.datiopen.it/">http://www.datiopen.it/</a>): "Municipi di Roma". 
I downloaded the shape file that I elaborated it using QGIS (<a href="https://www.qgis.org/">https://www.qgis.org/</a>) to export a geojson file that associates for each neighborhood the latitude and the longitude of its centroid.
The resulting file is available here: <a href="http://5.249.144.7/donatellagubiani/Coursera/municipi_centroidi_js.geojson">municipi_centroidi_js.geojson</a>.

Before to show the data, I install and import all required libraries.

In [1]:
!pip install geopy
!pip install folium

print('Libraries installed!')

Collecting geopy
[?25l  Downloading https://files.pythonhosted.org/packages/53/fc/3d1b47e8e82ea12c25203929efb1b964918a77067a874b2c7631e2ec35ec/geopy-1.21.0-py2.py3-none-any.whl (104kB)
[K     |████████████████████████████████| 112kB 1.4MB/s eta 0:00:01
[?25hCollecting geographiclib<2,>=1.49 (from geopy)
  Downloading https://files.pythonhosted.org/packages/8b/62/26ec95a98ba64299163199e95ad1b0e34ad3f4e176e221c40245f211e425/geographiclib-1.50-py3-none-any.whl
Installing collected packages: geographiclib, geopy
Successfully installed geographiclib-1.50 geopy-1.21.0
Libraries installed!


In [2]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

import requests # library to handle requests
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


So, I can download the geojson file, load it and then extract the required data (Neighborhood, Latitude, Longitude) to be included in a pandas's dataframe.

In [3]:
# download data
!wget -q -O 'rome_data.json' http://5.249.144.7/donatellagubiani/Coursera/municipi_centroidi_js.geojson
#print('Data downloaded!')
# load geojson data
with open('rome_data.json') as json_data:
    rome = json.load(json_data)
# define the required dataframe columns
column_names = ['Neighborhood', 'Latitude', 'Longitude'] 
# instantiate the dataframe
neighborhoods_rome = pd.DataFrame(columns=column_names)
# populate the dataframe extracting the required values from the geojson
data_rome = rome['features']
data_rome[0]
for data in data_rome:
    neighborhood_name = data['properties']['municipio']     
    neighborhood_latlon = data['geometry']['coordinates']
    neighborhood_lat = neighborhood_latlon[1]
    neighborhood_lon = neighborhood_latlon[0]
    neighborhoods_rome = neighborhoods_rome.append({'Neighborhood': 'Municipio ' +neighborhood_name,
                                          'Latitude': neighborhood_lat,
                                          'Longitude': neighborhood_lon}, ignore_index=True)

# quick lock at data
neighborhoods_rome

Unnamed: 0,Neighborhood,Latitude,Longitude
0,Municipio 1,41.898628,12.47795
1,Municipio 2,41.920704,12.50118
2,Municipio 3,41.996439,12.554324
3,Municipio 4,41.932162,12.591343
4,Municipio 5,41.889105,12.575565
5,Municipio 6,41.88755,12.687117
6,Municipio 7,41.839042,12.581899
7,Municipio 8,41.828667,12.529168
8,Municipio 9,41.757649,12.497813
9,Municipio 10,41.73939,12.3631


Before to create a map, I determine the geographical coordinates of Rome.

In [4]:
address = 'Rome, Italy'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geographical coordinates of Rome (Italy) are {}, {}.'.format(latitude, longitude))

The geographical coordinates of Rome (Italy) are 41.8933203, 12.4829321.


And now I can show the map with the points that represent the neighborhood in Rome.

In [5]:
# create map of Rome using latitude and longitude values
map_rome = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, neighborhood in zip(neighborhoods_rome['Latitude'], neighborhoods_rome['Longitude'], neighborhoods_rome['Neighborhood']):
    label = '{}'.format(neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_rome)  
    
map_rome

<h3>Rome's venues</h3>

<p>Next, I need data about monuments and restaurants in Rome and I can obtain them working with the Foursquare API to explore the neighborhoods.<p>

In [6]:
# @hidden_cell
#
CLIENT_ID = 'OGIFOMCZEM2SUL2DP4BFA4LYHGX2KULMKCXM0WSIHKI0LAYD' # your Foursquare ID
CLIENT_SECRET = 'YDY4FRTT0V4DKBOTEVSQ34XOHC4KX5FZ2IASOQT2BDFITTDR' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version
#print('Your credentails:')
#print('CLIENT_ID: ' + CLIENT_ID)
#print('CLIENT_SECRET:' + CLIENT_SECRET)

After setting my personal data for Foursquare, I can obtain the required data
Function to explore data (radius=500, limits=100) of all the neighborhoods in Rome.

I run the above function on each neighborhood and create a new dataframe called rome_venues. One note: checking the general distance between different neighborhoods, I decided to set the radius to 6000. As will be visible on the map, it allow a good coverage.

In [7]:
def getNearbyVenues(names, latitudes, longitudes, radius=500, limits=100):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limits)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [8]:
rome_venues = getNearbyVenues(names=neighborhoods_rome['Neighborhood'],
                              latitudes=neighborhoods_rome['Latitude'],
                              longitudes=neighborhoods_rome['Longitude'],
                              radius=6000)
#rome_venues.to_csv('rome_venues_6000_100.csv')
print(rome_venues.shape)
rome_venues.head()

Municipio 1
Municipio 2
Municipio 3
Municipio 4
Municipio 5
Municipio 6
Municipio 7
Municipio 8
Municipio 9
Municipio 10
Municipio 11
Municipio 12
Municipio 13
Municipio 14
Municipio 15
(1036, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Municipio 1,41.898628,12.47795,Pantheon,41.899133,12.476805,Monument / Landmark
1,Municipio 1,41.898628,12.47795,Pizza e Mozzarella,41.897598,12.479097,Pizza Place
2,Municipio 1,41.898628,12.47795,Il Panino Ingegnoso,41.899982,12.479195,Sandwich Place
3,Municipio 1,41.898628,12.47795,Piazza della Rotonda,41.899253,12.476779,Plaza
4,Municipio 1,41.898628,12.47795,Venchi,41.900042,12.480883,Ice Cream Shop


And I show them on a map.

In [9]:
# create map of Rome with all venues
map_romeV = map_rome #folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, venue, category, neighborhood in zip(rome_venues['Venue Latitude'], rome_venues['Venue Longitude'], rome_venues['Venue'], rome_venues['Venue Category'], rome_venues['Neighborhood']):
    label = '{} ({} in {})'.format(venue, category, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='red',
        fill=True,
        fill_color='#cc3186',
        fill_opacity=0.7,
        parse_html=False).add_to(map_romeV)  
    
map_romeV

I check the categories and count elements for each category.

In [10]:
rome_venues.groupby(['Venue Category']).agg(['count'])

Unnamed: 0_level_0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude
Unnamed: 0_level_1,count,count,count,count,count,count
Venue Category,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
American Restaurant,3,3,3,3,3,3
Art Museum,9,9,9,9,9,9
Asian Restaurant,9,9,9,9,9,9
Athletics & Sports,6,6,6,6,6,6
Auditorium,1,1,1,1,1,1
Automotive Shop,1,1,1,1,1,1
BBQ Joint,4,4,4,4,4,4
Bakery,9,9,9,9,9,9
Bar,5,5,5,5,5,5
Basketball Stadium,1,1,1,1,1,1


I check also the number of venues for each neighborhood.

In [11]:
rome_venues.groupby(['Neighborhood']).agg(['count'])

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Unnamed: 0_level_1,count,count,count,count,count,count
Neighborhood,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Municipio 1,100,100,100,100,100,100
Municipio 10,89,89,89,89,89,89
Municipio 11,73,73,73,73,73,73
Municipio 12,32,32,32,32,32,32
Municipio 13,21,21,21,21,21,21
Municipio 14,30,30,30,30,30,30
Municipio 15,32,32,32,32,32,32
Municipio 2,100,100,100,100,100,100
Municipio 3,88,88,88,88,88,88
Municipio 4,100,100,100,100,100,100


From some tests, I verified that in this dataframe some important data are missing, as for the Colosseo. I try to increase the distance but it appears only with a big distance connected to a neighborhood that is not the close one. This because in any case the output of is limited to 100 venues, even if the parameter limits is set to 200. 

In [12]:
rome_venues_200 = getNearbyVenues(names=neighborhoods_rome['Neighborhood'],
                              latitudes=neighborhoods_rome['Latitude'],
                              longitudes=neighborhoods_rome['Longitude'],
                              radius=6000,
                              limits=200)
rome_venues_200.to_csv('rome_venues_6000_200.csv')
print(rome_venues_200.shape)
rome_venues_200.head()

Municipio 1
Municipio 2
Municipio 3
Municipio 4
Municipio 5
Municipio 6
Municipio 7
Municipio 8
Municipio 9
Municipio 10
Municipio 11
Municipio 12
Municipio 13
Municipio 14
Municipio 15
(1036, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Municipio 1,41.898628,12.47795,Pantheon,41.899133,12.476805,Monument / Landmark
1,Municipio 1,41.898628,12.47795,Pizza e Mozzarella,41.897598,12.479097,Pizza Place
2,Municipio 1,41.898628,12.47795,Il Panino Ingegnoso,41.899982,12.479195,Sandwich Place
3,Municipio 1,41.898628,12.47795,Piazza della Rotonda,41.899253,12.476779,Plaza
4,Municipio 1,41.898628,12.47795,Venchi,41.900042,12.480883,Ice Cream Shop


In [13]:
rome_venues.groupby(['Neighborhood']).agg(['count'])

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Unnamed: 0_level_1,count,count,count,count,count,count
Neighborhood,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Municipio 1,100,100,100,100,100,100
Municipio 10,89,89,89,89,89,89
Municipio 11,73,73,73,73,73,73
Municipio 12,32,32,32,32,32,32
Municipio 13,21,21,21,21,21,21
Municipio 14,30,30,30,30,30,30
Municipio 15,32,32,32,32,32,32
Municipio 2,100,100,100,100,100,100
Municipio 3,88,88,88,88,88,88
Municipio 4,100,100,100,100,100,100


Starting from this consideration, I decide to use this dataset to compare and evaluate which could be the better neighborhood, depending my preferences, where to reserve a hotel and so, I try to analyse data around hotels.

<h3>Rome's hotels</h3>

For a second second step of my analysis, I get hotels for each neighborhoods and so, for each one the closest venues.

To get the hotels, I change a bit the previous function (getNearbyVenues) including an addition paramiter to explore only a specyfic category (hotel).
Then, I run the new function on each neighborhood and create a new dataframe called rome_hotels. The radius is set to 6000 for the same reason in the first exploration (a good coverage).

In [14]:
def getNearbyHotel(names, latitudes, longitudes, radius, limits):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        #print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            limits,
            '4bf58dd8d48988d1fa931735') #id for Hotel from https://developer.foursquare.com/docs/resources/categories
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [16]:
rome_hotels = getNearbyHotel(names=neighborhoods_rome['Neighborhood'],
                              latitudes=neighborhoods_rome['Latitude'],
                              longitudes=neighborhoods_rome['Longitude'],
                              radius=6000,
                              limits=100)
rome_hotels.to_csv('rome_hotels_6000_100.csv')
print(rome_hotels.shape)
rome_hotels.head()

(468, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,Municipio 1,41.898628,12.47795,Hotel de Russie,41.910126,12.477775,Hotel
1,Municipio 1,41.898628,12.47795,The First Luxury Art Hotel Roma,41.908661,12.475566,Hotel
2,Municipio 1,41.898628,12.47795,iQ Hotel Roma,41.900426,12.495174,Hotel
3,Municipio 1,41.898628,12.47795,Hotel Indigo Rome - St. George,41.898221,12.466171,Hotel
4,Municipio 1,41.898628,12.47795,Hotel Majestic,41.905438,12.488135,Hotel


In [17]:
# create map of Rome with all italian restaurants
map_romeH = folium.Map(location=[latitude, longitude], zoom_start=10)

# add markers to map
for lat, lng, venue, category, neighborhood in zip(rome_hotels['Venue Latitude'], rome_hotels['Venue Longitude'], rome_hotels['Venue'], rome_hotels['Venue Category'], rome_hotels['Neighborhood']):
    label = '{} ({} in {})'.format(venue, category, neighborhood)
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=3,
        popup=label,
        color='green',
        fill=True,
        fill_color='#31cc86',
        fill_opacity=0.7,
        parse_html=False).add_to(map_romeH)  
    
map_romeH

In [18]:
rome_hotels.groupby(['Neighborhood']).agg(['count'])

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Unnamed: 0_level_1,count,count,count,count,count,count
Neighborhood,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
Municipio 1,100,100,100,100,100,100
Municipio 10,5,5,5,5,5,5
Municipio 11,14,14,14,14,14,14
Municipio 12,11,11,11,11,11,11
Municipio 13,16,16,16,16,16,16
Municipio 14,7,7,7,7,7,7
Municipio 15,7,7,7,7,7,7
Municipio 2,100,100,100,100,100,100
Municipio 3,8,8,8,8,8,8
Municipio 4,42,42,42,42,42,42


Now, using the last dataset, for each hotel I explore the closed venues. I can use now the first function (not only hotels). In this case, I decide to set the radius to 500 to evaluate hotels with respect only their closest venues.

For the Foursquare's limits, I'm waiting to perform the first analysis and to determine neighborhoods/hotels to explore related venues and get data to create dataframe only for the selected elements.  