# Best neighbourhoods in Naples to open an office

There's an Italian IT growing company that is planning to open a new office.
The first office is in Naples and the second one should be there too. 
I will analyse which one is the best location for the new office, my goal is to satisfy both entrepreneur and employeers needs to create the best working environment. So the office should be in a zone that can be easily reached, in a zone with restaurant and coffee shops so that the employers can enjoy their breaks.
This analisys can be actually reused from anyone intends to open a new office in Naples and care about the happiness of their employers.

In order to retrieve the list of the neighborhoods in Naples, I will do a GET request of the Wikipedia website (https://it.wikipedia.org/wiki/Quartieri_di_Napoli). Then I will use the list in this website: https://news.unicreditsubitocasa.it/vendere-e-comprare/napoli-quartieri-prezzi-immobili-trasporti/ to retrieve the best neighborhoods and to create a map of them, to have a better understanding of the area we are going to analyse.
I will use a csv file about the coordinates of the stations in each neighborhood and finally I will use the Foursquare API to figure out which one is the perfect place for a new office, looking for a place near a train station and with a good choice of restaurants and coffee shops that results good for workers.

Methodology section which represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, and what machine learnings were used and why.

In [None]:
!pip install beautifulsoup4
!pip install folium

In [16]:
#import section

import numpy as np 
import pandas as pd 

from geopy.geocoders import Nominatim

import requests
from bs4 import BeautifulSoup
from pandas.io.json import json_normalize

import matplotlib.cm as cm
import matplotlib.colors as colors

from sklearn.cluster import KMeans

import folium

In [7]:
wiki_link = 'https://it.wikipedia.org/wiki/Quartieri_di_Napoli'
wiki_page = requests.get(wiki_link)

#I use BeautifulSoup package to get the entire HTML text
page = BeautifulSoup(wiki_page.text, 'html.parser')

#I use the find_all function of BeautifulSoup package to find all table tag 
#and take only the first one that was the one I was interested to
table = page.find_all('table')[0]

#I use again the find_all function to get the rows of the table as a list
rows = table.find_all('tr')

#I will retrieve only the neighborhood, that is the only thing we're interested to
columns = ['Neighborhood']
nap = pd.DataFrame(columns=columns)

#I use again the find_all function to retrive all the elements in a row. 
#Then I take only the text between the first tag and assign it to the column in the dataframe
for row in rows[1:]:
    elements = row.find_all('td')
    neighbh = elements[0].text
    nap = nap.append({'Neighborhood': neighbh}, ignore_index=True)

In [8]:
nap.head()

Unnamed: 0,Neighborhood
0,Arenella
1,Avvocata
2,Bagnoli
3,Barra
4,Chiaia


In [9]:
_link = 'https://news.unicreditsubitocasa.it/vendere-e-comprare/napoli-quartieri-prezzi-immobili-trasporti/'
_page = requests.get(_link)
page = BeautifulSoup(_page.text, 'html.parser')
neighbohs = page.find_all('h3')

best_neighbh = pd.DataFrame(columns=columns)

for neighb in neighbohs:
    n = neighb.text
    best_neighbh = best_neighbh.append({'Neighborhood': n[10:]}, ignore_index=True)

In [10]:
best_neighbh

Unnamed: 0,Neighborhood
0,Chiaia
1,rico
2,Vomero
3,Posillipo
4,Fuorigrotta
5,orrelati
6,interessarti anche:


I'll compare the two dataframes to clean the bad data from the best_neighbh dataframe.
I'm doing that in order to have a double check on the data.

In [11]:
final_df = pd.DataFrame(columns=columns)

final_df['Neighborhood'] = pd.merge(nap, best_neighbh, on='Neighborhood')
final_df

Unnamed: 0,Neighborhood
0,Chiaia
1,Fuorigrotta
2,Posillipo
3,Vomero


Now I read the csv file about stations from my IBM cloud

In [12]:
# The code was removed by Watson Studio for sharing.

Unnamed: 0,Neighborhood,Train Station,Latitude,Longitude
0,Chiaia,Piazza Amedeo,40.833303,14.221684
1,Fuorigrotta,Cavalleggeri Aosta,40.82256,14.187219
2,Vomero,Vanvitelli,40.843586,14.222435
3,Vomero,Quattro giornate,40.845884,14.2231


In [13]:
final_df = pd.merge(final_df, stations, on='Neighborhood', how='right')
final_df.head()

Unnamed: 0,Neighborhood,Train Station,Latitude,Longitude
0,Chiaia,Piazza Amedeo,40.833303,14.221684
1,Fuorigrotta,Cavalleggeri Aosta,40.82256,14.187219
2,Vomero,Vanvitelli,40.843586,14.222435
3,Vomero,Quattro giornate,40.845884,14.2231


Let's plot a map of the area, drawing a circle on each station

In [17]:
address = 'Naples, NA'

geolocator = Nominatim(user_agent="nap_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude

nap_map = folium.Map(location=[latitude, longitude], zoom_start=13)

# add markers to map
for lat, lng, label in zip(final_df['Latitude'], final_df['Longitude'], final_df['Train Station']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7).add_to(nap_map)  
    
nap_map

How you can see, there is not any station in Posillipo neighborhood, and there are two in Vomero.
I choose to analyse the Vomero neighborhood as is the easiest one to reach.

In [18]:
CLIENT_ID = 'B4YHWQTP5RAQF5BKNN5LZZMNTGQVBFAS3MVBH13OEX4QLWPI' # replace with your Foursquare ID
CLIENT_SECRET = '031GK1WSHTJS0NGOTGHC5RB53FGZQITT2BIHU3HOBJCL0HFK' # replace with your Foursquare Secret
VERSION = '20180605' # Foursquare API version

address = 'Vomero, Naples'

geolocator = Nominatim(user_agent="nap_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Vomero neighborhood are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Vomero neighborhood are 40.8438464, 14.2254818.


I will search in all the neighborhood that extends approximately for 1km of diameter, and with this result I will decide where is the best place to open the office

In [19]:
VERSION = '20180604'
LIMIT = 250
radius = 900
search_query = 'Restaurant'

url = 'https://api.foursquare.com/v2/venues/search?client_id={}&client_secret={}&ll={},{}&intent={}&v={}&query={}&radius={}&limit={}'.format(CLIENT_ID, CLIENT_SECRET, latitude, longitude, 'browse', VERSION, search_query, radius, LIMIT)
url

'https://api.foursquare.com/v2/venues/search?client_id=B4YHWQTP5RAQF5BKNN5LZZMNTGQVBFAS3MVBH13OEX4QLWPI&client_secret=031GK1WSHTJS0NGOTGHC5RB53FGZQITT2BIHU3HOBJCL0HFK&ll=40.8438464,14.2254818&intent=browse&v=20180604&query=Restaurant&radius=900&limit=250'

In [20]:
results = requests.get(url).json()
results

{'meta': {'code': 200, 'requestId': '5c4f15b8f594df20effa9949'},
 'response': {'venues': [{'categories': [{'icon': {'prefix': 'https://ss3.4sqi.net/img/categories_v2/food/japanese_',
       'suffix': '.png'},
      'id': '4bf58dd8d48988d111941735',
      'name': 'Japanese Restaurant',
      'pluralName': 'Japanese Restaurants',
      'primary': True,
      'shortName': 'Japanese'}],
    'hasPerk': False,
    'id': '5b264e595455b20039ca02e0',
    'location': {'address': 'via Gianlorenzo Bernini 17',
     'cc': 'IT',
     'city': 'Napoli',
     'country': 'Italia',
     'distance': 540,
     'formattedAddress': ['via Gianlorenzo Bernini 17',
      '80129 Napoli Campania',
      'Italia'],
     'labeledLatLngs': [{'label': 'display',
       'lat': 40.843086,
       'lng': 14.231825}],
     'lat': 40.843086,
     'lng': 14.231825,
     'neighborhood': 'Vomero',
     'postalCode': '80129',
     'state': 'Campania'},
    'name': 'Sumo Sushi - Japanese Restaurant',
    'referralId': 'v-154868

In [21]:
venues = results['response']['venues']

dataframe = json_normalize(venues)
dataframe.head()

Unnamed: 0,categories,hasPerk,id,location.address,location.cc,location.city,location.country,location.distance,location.formattedAddress,location.labeledLatLngs,location.lat,location.lng,location.neighborhood,location.postalCode,location.state,name,referralId
0,"[{'pluralName': 'Japanese Restaurants', 'icon'...",False,5b264e595455b20039ca02e0,via Gianlorenzo Bernini 17,IT,Napoli,Italia,540,"[via Gianlorenzo Bernini 17, 80129 Napoli Camp...","[{'lng': 14.231825, 'lat': 40.843086, 'label':...",40.843086,14.231825,Vomero,80129.0,Campania,Sumo Sushi - Japanese Restaurant,v-1548686776
1,"[{'pluralName': 'Japanese Restaurants', 'icon'...",False,59d91e559ef8ef736a5dd160,"Via Gioacchino Rossini, 1",IT,Napoli,Italia,464,"[Via Gioacchino Rossini, 1, 80128 Napoli Campa...","[{'lng': 14.223535, 'lat': 40.847755, 'label':...",40.847755,14.223535,,80128.0,Campania,Nagoya Japanese Restaurant,v-1548686776
2,"[{'pluralName': 'Restaurants', 'icon': {'suffi...",False,4ed2480f722e01c58494bd88,,IT,Napoli,Italia,837,"[Napoli Campania, Italia]","[{'lng': 14.230110244508099, 'lat': 40.8371844...",40.837184,14.23011,,,Campania,George's Restaurant,v-1548686776
3,"[{'pluralName': 'Restaurants', 'icon': {'suffi...",False,4eadeac5a17c199864ed4b88,corso vittorio emanuele 141,IT,Napoli,Italia,886,"[corso vittorio emanuele 141, Napoli Campania,...","[{'lng': 14.231302441261596, 'lat': 40.8372087...",40.837209,14.231302,,,Campania,Veritas Restaurant,v-1548686776


In [22]:
# keep only columns that include venue name, and anything that is associated with location
filtered_columns = ['name', 'categories'] + [col for col in dataframe.columns if col.startswith('location.')] + ['id']
dataframe_filtered = dataframe.loc[:, filtered_columns]

# function that extracts the category of the venue
def get_category_type(row):
    try:
        categories_list = row['categories']
    except:
        categories_list = row['venue.categories']
        
    if len(categories_list) == 0:
        return None
    else:
        return categories_list[0]['name']

# filter the category for each row
dataframe_filtered['categories'] = dataframe_filtered.apply(get_category_type, axis=1)

# clean column names by keeping only last term
dataframe_filtered.columns = [column.split('.')[-1] for column in dataframe_filtered.columns]

dataframe_filtered

Unnamed: 0,name,categories,address,cc,city,country,distance,formattedAddress,labeledLatLngs,lat,lng,neighborhood,postalCode,state,id
0,Sumo Sushi - Japanese Restaurant,Japanese Restaurant,via Gianlorenzo Bernini 17,IT,Napoli,Italia,540,"[via Gianlorenzo Bernini 17, 80129 Napoli Camp...","[{'lng': 14.231825, 'lat': 40.843086, 'label':...",40.843086,14.231825,Vomero,80129.0,Campania,5b264e595455b20039ca02e0
1,Nagoya Japanese Restaurant,Japanese Restaurant,"Via Gioacchino Rossini, 1",IT,Napoli,Italia,464,"[Via Gioacchino Rossini, 1, 80128 Napoli Campa...","[{'lng': 14.223535, 'lat': 40.847755, 'label':...",40.847755,14.223535,,80128.0,Campania,59d91e559ef8ef736a5dd160
2,George's Restaurant,Restaurant,,IT,Napoli,Italia,837,"[Napoli Campania, Italia]","[{'lng': 14.230110244508099, 'lat': 40.8371844...",40.837184,14.23011,,,Campania,4ed2480f722e01c58494bd88
3,Veritas Restaurant,Restaurant,corso vittorio emanuele 141,IT,Napoli,Italia,886,"[corso vittorio emanuele 141, Napoli Campania,...","[{'lng': 14.231302441261596, 'lat': 40.8372087...",40.837209,14.231302,,,Campania,4eadeac5a17c199864ed4b88


I will drop the labeledLatLngs column and replace the NaN values in neighborhood column with 'Vomero' because we know that are in that neighborhood. The other NaN values are ok because they're all about the location and we have the lat and lng.

In [23]:
dataframe_filtered.drop(['labeledLatLngs'], axis=1, inplace=True)
dataframe_filtered.replace({'neighborhood':np.nan}, value='Vomero')

Unnamed: 0,name,categories,address,cc,city,country,distance,formattedAddress,lat,lng,neighborhood,postalCode,state,id
0,Sumo Sushi - Japanese Restaurant,Japanese Restaurant,via Gianlorenzo Bernini 17,IT,Napoli,Italia,540,"[via Gianlorenzo Bernini 17, 80129 Napoli Camp...",40.843086,14.231825,Vomero,80129.0,Campania,5b264e595455b20039ca02e0
1,Nagoya Japanese Restaurant,Japanese Restaurant,"Via Gioacchino Rossini, 1",IT,Napoli,Italia,464,"[Via Gioacchino Rossini, 1, 80128 Napoli Campa...",40.847755,14.223535,Vomero,80128.0,Campania,59d91e559ef8ef736a5dd160
2,George's Restaurant,Restaurant,,IT,Napoli,Italia,837,"[Napoli Campania, Italia]",40.837184,14.23011,Vomero,,Campania,4ed2480f722e01c58494bd88
3,Veritas Restaurant,Restaurant,corso vittorio emanuele 141,IT,Napoli,Italia,886,"[corso vittorio emanuele 141, Napoli Campania,...",40.837209,14.231302,Vomero,,Campania,4eadeac5a17c199864ed4b88


In [24]:
vomero_map = folium.Map(location=[latitude, longitude], zoom_start=15)

for lat, lng, label in zip(dataframe_filtered.lat, dataframe_filtered.lng, dataframe_filtered.categories):
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        color='blue',
        popup=label,
        fill = True,
        fill_color='blue',
        fill_opacity=0.6
    ).add_to(vomero_map)
    
vomero_map

Results section where you discuss the results:

According to the map, the best place where to open an office is between two Neighborhoods: Vomero and Chiaia.
In this area there are different restaurants and also a park where the employeers can enjoy their breaks.

Discussion section where you discuss any observations you noted and any recommendations you can make based on the results:

Is better to have a Commercial account when you use the Foursquare API, or the data you retrieve is never enough for a good analisys.
Folium library is a really good method to analyse the data retrieved with Foursquare API. In this way you have a better understanding of the zone you're analysing and it's really much simpler to analyse the data, because you have a visual feedback of it.

Conclusion section where you conclude the report:

I firstly have retrieved the data about neighborhoods from wikipedia website. Then I worked on it to create a dataframe. 
For a in-depth analysis I've retrieved the data from another website in which are described the best neighborhoods where to live in Naples and I've compared my dataframe and the one built with the second website data to get the existing best neighborhoods.
I've mapped the train stations for each neighborhoods and I've figured out that the most easily reachable is the Vomero neighborhood, because there are two station in a radius of 250m. So I decided to analyse this neighborhood. Finally, with Foursquare API I've retrieved the data about nearby restaurants. There are many restaurants in that zone, food from all countries of the world (Japan, India etc) so all the employers have a good choice for what to eat during lunch break. And also there is a park where they can enjoy their break. 
So it's definitively a good place where to open an office!