# Geneva House Renting Consultancy
Capstone Project, Applied Data Science Capstone by IBM/Coursera, Simone Lisi.

## Finding Geneva's point of interest, via foursquare
In this notebook, we will look for point of interests in Geneva.

According to a study published in the journal  <a href="https://journals.plos.org/plosone/article?id=10.1371/journal.pone.0164553">PLoS One</a> (2016; 11(10): e0164553) the proximity of certain locations around a property can influence it's price to a certain degree. The study quantitaively addresses the housing prices in the chinese city of Shenzhen, but the principles can be reasonably assumed as general. The idea, is that the value of a property can benefit form the proximity of structures like: schools, hospitals, transports, parks, and markets.
The results will be stored in a dictionary and saved in the file "db_geneva_app_FA.json", for later use.

Some cells are enclosed markdown cells, starting and ending with: ////////////////////
//////////////////// ////////////////////

These cells are set as markdown, as they are not supposed to be run except when we want to scrape new data, or install missing packages. For this they can be set to 'code'. Otherwise, the program will load data previousely scraped and stored.

##  //////////////////// //////////////////// ////////////////////
### !!! Set this cell to 'code' if certain packages are not installed yet.
!conda install -c anaconda lxml --yes

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab

!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
##  //////////////////// //////////////////// ////////////////////


In [2]:
## importing libraries

import numpy as np # library to handle data in a vectorized manner
import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
import json # library to handle JSON files
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values
import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe
# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors
#import k-means from clustering stage
from sklearn.cluster import KMeans
import folium # map rendering library



%matplotlib inline

In [3]:
### Geneva coordinates
address = 'Geneva, switzerland'

geolocator = Nominatim(user_agent= "To_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Geneva are {}, {}.'.format(latitude, longitude))

The geograpical coordinate of Geneva are 46.2017559, 6.1466014.


In [4]:
#hidden foursquare credentials
CLIENT_ID = 'xxxx'# your Foursquare ID
CLIENT_SECRET = 'xxxx' # your Foursquare Secret

VERSION = '20180605' # Foursquare API version
LIMIT  = 100
radius = '8000'

In [5]:
# we create a dictionary with the foursquare category id corresponding to selected categories of  POI
foursquare_POI_cat_id = {
    '52e81612bcbc57f1066b7a46' : 'Private_School',
    '4f4533814b9074f6e4fb0106' : 'Middle_School',
    '4bf58dd8d48988d13d941735' : 'High_School',
    '4f4533814b9074f6e4fb0107' : 'Nursery_School',
    '4f4533804b9074f6e4fb0105' : 'Elementary_School',
    '4bf58dd8d48988d196941735' : 'Hospital',
    '4bf58dd8d48988d1fe931735' : 'Bus_Station',
    '4bf58dd8d48988d12b951735' : 'Bus_Line',
    '52f2ab2ebcbc57f1066b8b4f' : 'Bus_Stop',
    '4bf58dd8d48988d129951735' : 'Train_Station',
    '4f4531504b9074f6e4fb0102' : 'Platform',
    '4bf58dd8d48988d12a951735' : 'Train',
    '52f2ab2ebcbc57f1066b8b51' : 'Tram_Station',
    '4bf58dd8d48988d163941735' : 'Park',
    '50be8ee891d4fa8dcc7199a7' : 'Market',
    '52f2ab2ebcbc57f1066b8b46' : 'Supermarket'
}

# and a dictionary with the same keys and colors as items for later plots
foursquare_POI_cat_id_col = {
    '52e81612bcbc57f1066b7a46' : 'orange',
    '4f4533814b9074f6e4fb0106' : 'orange',
    '4bf58dd8d48988d13d941735' : 'orange',
    '4f4533814b9074f6e4fb0107' : 'orange',
    '4f4533804b9074f6e4fb0105' : 'orange',
    '4bf58dd8d48988d196941735' : 'black',
    '4bf58dd8d48988d1fe931735' : 'red',
    '4bf58dd8d48988d12b951735' : 'red',
    '52f2ab2ebcbc57f1066b8b4f' : 'red',
    '4bf58dd8d48988d129951735' : 'red',
    '4f4531504b9074f6e4fb0102' : 'red',
    '4bf58dd8d48988d12a951735' : 'red',
    '52f2ab2ebcbc57f1066b8b51' : 'red',
    '4bf58dd8d48988d163941735' : 'green',
    '50be8ee891d4fa8dcc7199a7' : 'blue',
    '52f2ab2ebcbc57f1066b8b46' : 'blue'
}
   
if '' in list(foursquare_POI_cat_id_col.keys()):
    print('fuck')

In [6]:
def getVenuesbycat(lat, lng, cat_id, radius=500):
    
    venues_list=[]
  
    # create the API request URL
    url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}&categoryId={}'.format(
    CLIENT_ID, 
    CLIENT_SECRET, 
    VERSION, 
    latitude, 
    longitude, 
    radius, 
    LIMIT,
    cat_id)
            
    # make the GET request
    try: 
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['categories'][0]['id']) for v in results])
    
    except:
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name'],
            v['venue']['categories'][0]['id']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = [ 'Neighborhood Latitude', 
                'Neighborhood Longitude', 
                'Venue', 
                'Venue Latitude', 
                'Venue Longitude', 
                'Venue Category',
                'categoryId']
    
    
        
    return(nearby_venues)

##  //////////////////// //////////////////// ////////////////////
### !!! Set this cell to 'code' only the first time the notebook is run.
### If this is the first time, uncomment the line #dic_save(results_dic) in the next cell.
### Otherwise, run as it is

results_dic = {}
for cat_id in foursquare_POI_cat_id.keys():
    results_dic[cat_id] = getVenuesbycat(latitude, longitude, cat_id, 8000)

print(type(results_dic))

##  //////////////////// //////////////////// ////////////////////


In [None]:
#### save dictionay of dataframes for later use
# uncommen
import csv
import os
root = 'geneva_POI'


def dic_save(dic):
    for key, val in dic.items():
        val.to_csv(os.path.join(root, 'data_{}.csv'.format(str(key))))
        
    with open(os.path.join(root, 'key.txt'), "w") as f: #saving keys to file
        f.write(str(list(dic.keys())))

def dic_load():
    """Reading data from keys"""
    with open(os.path.join(root, 'key.txt'), "r") as f:
        keys = eval(f.read())

    dic = {}    
    for key in keys:
        dic[key] = pd.read_csv(os.path.join(root, 'data_{}.csv'.format(str(key))))

    return dic

# uncomment here to save the newly scraped data.
#dic_save(results_dic)
results_dic = dic_load()

for key in results_dic.keys():
    if('Unnamed: 0.1' in results_dic[key].columns):
        results_dic[key].drop(axis=1, labels = 'Unnamed: 0.1', inplace = True)
    if('Unnamed: 0' in results_dic[key].columns):
        results_dic[key].drop(axis=1, labels = 'Unnamed: 0', inplace = True)
        



In [34]:
# Check how the database asscoiated to a key looks like
results_dic['4bf58dd8d48988d13d941735'].head()


Unnamed: 0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category,categoryId
0,46.201756,6.146601,Collège Calvin,46.200947,6.151637,High School,4bf58dd8d48988d13d941735
1,46.201756,6.146601,Collège de Candolle,46.202575,6.153011,High School,4bf58dd8d48988d13d941735
2,46.201756,6.146601,Collège et École de Commerce Nicolas-Bouvier,46.205566,6.132289,High School,4bf58dd8d48988d13d941735
3,46.201756,6.146601,Collège Sismondi,46.221281,6.141529,High School,4bf58dd8d48988d13d941735
4,46.201756,6.146601,ECG Henry-Dunant,46.213317,6.118221,High School,4bf58dd8d48988d13d941735


In [9]:
for key in  results_dic.keys():
    if results_dic[key].iloc[0,3] == 'missing':
        print(foursquare_POI_cat_id[key])

In [28]:
## we define a dictionary with the coordinates of few relevants sites in Geneva
key_sites_dic = {
    'cornavin' : [46.2106, 6.1425310072],
    'carouge' : [46.18733, 6.13750521],    
    'eauxvives' : [46.202, 6.160],
    'plainpalais' : [46.199, 6.142],
    'champel' : [46.1931, 6.158]
}

# and a dictionary to color code them in a map
key_sites_dic_col = {
    'cornavin' : 'red',
    'carouge' : 'green',    
    'eauxvives' : 'purple',
    'plainpalais' : 'yellow',
    'champel' : 'brown'
}




### We generate a map for each group of POI

In [29]:
#### Schools
school_keys= ['52e81612bcbc57f1066b7a46','4f4533814b9074f6e4fb0106','4bf58dd8d48988d13d941735','4f4533814b9074f6e4fb0107', '4f4533804b9074f6e4fb0105'  ]
school_keys_color= {'52e81612bcbc57f1066b7a46' : 'blue','4f4533814b9074f6e4fb0106' : 'blue',
                    '4bf58dd8d48988d13d941735' : 'blue',
                    '4f4533814b9074f6e4fb0107' : 'red','4f4533804b9074f6e4fb0105' : 'red'  }

map_geneva = folium.Map(location=[latitude, longitude],width=600,height=400,control_scale = True,zoom_start=13)


# add markers to map
for key in school_keys:
    for lat, lng, name, categoryId in zip(results_dic[key]['Venue Latitude'], results_dic[key]['Venue Longitude'], results_dic[key]['Venue'],results_dic[key]['categoryId'] ):
        try:
            label = '{}, {}'.format(foursquare_POI_cat_id[key], name )
            label = folium.Popup(label, parse_html=True)
            folium.CircleMarker(
                [lat, lng],
                radius=5,
                popup=label,
                color='grey',
                fill=True,
                fill_opacity=0.7,
                parse_html=False).add_to(map_geneva)  
        except: 
            None
            
            
for key in key_sites_dic:
    label = '{}'.format(key.title() )
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [key_sites_dic[key][0], key_sites_dic[key][1]],
        radius=10,
        popup=label,
        color=key_sites_dic_col[key],
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_geneva)  
        
        
loc = 'Schools'
title_html = '''
             <h3 align="left" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   
map_geneva.get_root().html.add_child(folium.Element(title_html))
    
map_geneva

In [30]:
#### Transports

transports_keys= ['4bf58dd8d48988d1fe931735', '4bf58dd8d48988d12b951735', '52f2ab2ebcbc57f1066b8b4f', '4bf58dd8d48988d129951735', '52f2ab2ebcbc57f1066b8b51' ]
map_geneva = folium.Map(location=[latitude, longitude],width=600,height=400,control_scale = True,zoom_start=13)

# add markers to map
for key in transports_keys:
    for lat, lng, name in zip(results_dic[key]['Venue Latitude'], results_dic[key]['Venue Longitude'], results_dic[key]['Venue']):
        label = '{}, {}'.format(foursquare_POI_cat_id[key], name )
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='grey',
            fill=True,
            fill_opacity=0.7,
            parse_html=False).add_to(map_geneva)  
        
        
for key in key_sites_dic:
    label = '{}'.format(key.title() )
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [key_sites_dic[key][0], key_sites_dic[key][1]],
        radius=10,
        popup=label,
        color=key_sites_dic_col[key],
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_geneva)  
        

loc = 'Transports'
title_html = '''
             <h3 align="left" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   
map_geneva.get_root().html.add_child(folium.Element(title_html))
    
    
map_geneva

In [31]:
## Parks   
parks_keys= ['4bf58dd8d48988d163941735']
map_geneva = folium.Map(location=[latitude, longitude],width=600,height=400,control_scale = True,zoom_start=13)

# add markers to map
for key in parks_keys:
    for lat, lng, name in zip(results_dic[key]['Venue Latitude'], results_dic[key]['Venue Longitude'], results_dic[key]['Venue']):
        label = '{}, {}'.format(foursquare_POI_cat_id[key], name )
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='grey',
            fill=True,
            fill_opacity=0.7,
            parse_html=False).add_to(map_geneva)  
        
        
for key in key_sites_dic:
    label = '{}'.format(key.title() )
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [key_sites_dic[key][0], key_sites_dic[key][1]],
        radius=10,
        popup=label,
        color=key_sites_dic_col[key],
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_geneva)  
        

loc = 'Parks'
title_html = '''
             <h3 align="left" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   
map_geneva.get_root().html.add_child(folium.Element(title_html))
    
map_geneva

In [32]:
## Markets
markets_keys= ['50be8ee891d4fa8dcc7199a7', '52f2ab2ebcbc57f1066b8b46']
map_geneva = folium.Map(location=[latitude, longitude],width=600,height=400,control_scale = True,zoom_start=13)

# add markers to map
for key in markets_keys:
    for lat, lng, name in zip(results_dic[key]['Venue Latitude'], results_dic[key]['Venue Longitude'], results_dic[key]['Venue']):
        label = '{}, {}'.format(foursquare_POI_cat_id[key], name )
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='grey',
            fill=True,
            fill_opacity=0.7,
            parse_html=False).add_to(map_geneva)  
        
        
for key in key_sites_dic:
    label = '{}'.format(key.title() )
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [key_sites_dic[key][0], key_sites_dic[key][1]],
        radius=10,
        popup=label,
        color=key_sites_dic_col[key],
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_geneva)  
        

loc = 'Markets'
title_html = '''
             <h3 align="left" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   
map_geneva.get_root().html.add_child(folium.Element(title_html))    
map_geneva

In [33]:
##Hospitals

hospital_keys= ['4bf58dd8d48988d196941735']
map_geneva = folium.Map(location=[latitude, longitude],width=600,height=400,control_scale = True,zoom_start=13)

# add markers to map
for key in hospital_keys:
    for lat, lng, name in zip(results_dic[key]['Venue Latitude'], results_dic[key]['Venue Longitude'], results_dic[key]['Venue']):
        label = '{}, {}'.format(foursquare_POI_cat_id[key], name )
        label = folium.Popup(label, parse_html=True)
        folium.CircleMarker(
            [lat, lng],
            radius=5,
            popup=label,
            color='grey',
            fill=True,
            fill_opacity=0.7,
            parse_html=False).add_to(map_geneva)  
        
        
for key in key_sites_dic:
    label = '{}'.format(key.title() )
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [key_sites_dic[key][0], key_sites_dic[key][1]],
        radius=10,
        popup=label,
        color=key_sites_dic_col[key],
        fill=True,
        fill_opacity=0.7,
        parse_html=False).add_to(map_geneva)  
        
loc = 'Hospitals'
title_html = '''
             <h3 align="left" style="font-size:16px"><b>{}</b></h3>
             '''.format(loc)   
map_geneva.get_root().html.add_child(folium.Element(title_html))   
    
map_geneva