# Applied data science capstone: Data-based decision support to inform relocations

## Introduction
Description of the problem and business case


Relocations, moving to a new place and establishing one's home there due to e.g. change of job, are periods of great changes where several important decisions need to be taken. Among these important decisions, where to live is probably one of the most important ones. In effect, one's home location determines not only how much time will be used for commuting to work/to study or how big one's home will be given an available budget, but also what kind of services (grocery shops, restaurants, schools, cinemas, etc.) will be easily accessible.

In many cases, the decision of where to relocate is taken either quickly or based on limited information, especially when one is relocating far, e.g. to another country. 

This capstone will aim at developing a data-based decision support to help those in the process of relocating. 

To simplify the decision-making process of where to relocate, it is assumed that it depends on the following parameters:

* **Composition of neighbourhood**, this is a subjective criteria that depends on the individual preferences of the person relocating. 
* **Size of the new apartment**, this is a function of available budget and the chosen location (neighbourhood) to relocate. 
* **Commuting time**, this can be modeled as a function of the distance between the chosen location to relocate and the location of the commute (work/study). 

For the purpose of this capstone, the user (i.e. the one relocating) will define its preferences and constraints in terms of:
* **Location (target neighbourhood) he would like the new apartment's location to be similar to**, this can be the current apartment's location if the user finds it is a comfortable neighbourhood. 
* **Available budget**, this will be used to estimate the size of the apartment given a recommended location
* **Location of work/study**, this will be used to estimate commuting time by computing distance between work/study location and the new apartment's location.

The main idea is that users inform i) a neighbourhood location they like, ii) the city where they are rellocating, iii) an available budget and iv) the location of work/study. 

Given the above parameters, the user will be presented with suggested neighbourhoods to relocate. For each suggested neighbourhood, an estimated apartment size and daily commuting time will be calculated. This will provide decision-support to the user, that will then be able to target their apartment search on the recommended neighbourhoods. 

This project could be extended so that not only neighbourhoods, but actually apartments, are proposed to the user relocating. 

In order to suggest neighbourhoods for relocation, the Foursquare location data will be used to characterize the 'target neighbourhood' as well as the different neighbourhoods in the city where to relocate. Then, a clustering algorithm will be used to cluster the set of the neighbourhoods in the new city plus the target neighbourhood. Once similar neighbourhoods to the target one are identified, commuting times and apartment's size will be estimated based on user-provided information. 


## Libraries
Before progressing further, let's import the necessary libraries that will be used. 

In [1]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

import json # library to handle JSON files

!conda install -c conda-forge geopy --yes # uncomment this line if you haven't completed the Foursquare API lab
from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests
from pandas.io.json import json_normalize # tranform JSON file into a pandas dataframe

# Matplotlib and associated plotting modules
import matplotlib.cm as cm
import matplotlib.colors as colors

# import k-means from clustering stage
from sklearn.cluster import KMeans

#!conda install -c conda-forge folium=0.5.0 --yes # uncomment this line if you haven't completed the Foursquare API lab
import folium # map rendering library

print('Libraries imported.')


print('Libraries imported.')



usage: conda-script.py [-h] [-V] command ...
conda-script.py: error: unrecognized arguments: # uncomment this line if you haven't completed the Foursquare API lab


Libraries imported.


## Data
Where you describe the data that will be used to solve the problem and the source of the data.


### Neigbourhoods in Barcelona
We will assume that the user wants to relocate to Barcelona.

An overview of Barcelona's districts (each district contains several neighbourhoods) can be seen below: 
<img src="450px-Barcelona_districtes.svg.png" />

The coordinates of the different neighbourhoods in Barcelona will be extracted from <a href="https://en.wikipedia.org/wiki/Districts_of_Barcelona">this Wikipedia page</a>.

These coordinates will be used to explore the different neighbourhoods (plus the target one) in FOURSQUARE.



In [2]:
## Print map of Barcelona and its neighbourhoods
address = 'Barcelona, SPAIN'

geolocator = Nominatim(user_agent="ny_explorer")
location = geolocator.geocode(address)
latitude = location.latitude
longitude = location.longitude
print('The geograpical coordinate of Barcelona are {}, {}.'.format(latitude, longitude))


The geograpical coordinate of Barcelona are 41.3828939, 2.1774322.


In [3]:
# create map of Barcelona using latitude and longitude values
map_barcelona = folium.Map(location=[latitude, longitude], zoom_start=12)
map_barcelona

### Represent Barcelona's neighbourhoods
Here we will convert into a Pandas's dataframe the coordinates of Barcelona's neighbourhoods that have been filed into an Excel file once downloaded from the above website.

In [54]:
pd.read_excel('borough_BCN.xlsx', index_col=0)
data2 = pd.read_excel('borough_BCN.xlsx', sheet_name='Sheet1')
data2.head()

Unnamed: 0,Borough,Neighborhood,Latitude,Longitude
0,Ciutat Vella,La Barceloneta,41.379889,2.189361
1,Ciutat Vella,El Gotic,41.382778,2.176944
2,Ciutat Vella,El Raval,41.379722,2.168056
3,Ciutat Vella,"Sant Pere, Santa Caterina i la Ribera",41.384608,2.182717
4,Eixample,L'Antiga Esquerra de l'Eixample,41.39,2.155


In [5]:
# create map of Barcelona using latitude and longitude values
map_barcelona = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
for lat, lng, label in zip(data2['Latitude'], data2['Longitude'], data2['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_barcelona)  
    
map_barcelona

### Sqm price of Barcelona neighbourhoods

The price per square meter of an apprtment in Barcelona will be extracted from <a href="https://www.bcn.cat/estadistica/castella/dades/timm/ipreus/hab2mave/evo/t2mab.htm">https://www.bcn.cat/estadistica/castella/dades/timm/ipreus/hab2mave/evo/t2mab.htm</a>, which is provided by the local council of Barcelona. 
UARE.

## Additional user information 
For the purpose of illustrating this capstone project, the following parameters will be assumed:
* **Location (target neighbourhood) he would like the new apartment's location to be similar to**: a location in Madrid (similar city) will be chosen.
* **New work location**: the user will be working close to 'Sants Station', the main train station in Barcelona. 
* **User's available budget:** The user has an available budget of 300.000 EUR to buy the apartment where to relocte. 

### Location (target neighbourhood)
The target location is defined below:

In [6]:
## Print map of Barcelona and its neighbourhoods
address2 = 'Madrid, Barrio de Salamanca, SPAIN.'
latitude2 = 40.43
longitude2 = -3.677778
print('The target Neighbourhood is', address2, 'Its geograpical coordinates are', latitude2, ',', longitude2,'.' )

The target Neighbourhood is Madrid, Barrio de Salamanca, SPAIN. Its geograpical coordinates are 40.43 , -3.677778 .


In [22]:
# create map of Barcelona using latitude and longitude values
map_madrid = folium.Map(location=[latitude2, longitude2], zoom_start=12)

# add markers to map
folium.CircleMarker(
        [latitude2, longitude2],
        radius=5,
        popup='Barrio Salamanca',
        color='red',
        fill=True,
        fill_color='#3187cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_madrid)  
    
map_madrid

### New work location 
The new work location is defined here:

In [18]:
## Print map of Barcelona and its neighbourhoods
address3 = 'Barcelona, Sants Estacio, SPAIN.'
latitude3 = 41.380586
longitude3 = 2.140598
print('The new work location is', address3, 'Its geograpical coordinates are', latitude3, ',', longitude3,'.' )


The new work location is Barcelona, Sants Estacio, SPAIN. Its geograpical coordinates are 41.380586 , 2.140598 .


We can now show the new work location in the Map of Barcelona:

In [26]:
map_barcelona = folium.Map(location=[latitude, longitude], zoom_start=12)

# add markers to map
folium.CircleMarker(
        [latitude3, longitude3],
        radius=8,
        popup='New work location',
        color='red',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_barcelona)
# add markers to map
for lat, lng, label in zip(data2['Latitude'], data2['Longitude'], data2['Neighborhood']):
    label = folium.Popup(label, parse_html=True)
    folium.CircleMarker(
        [lat, lng],
        radius=5,
        popup=label,
        color='blue',
        fill=True,
        fill_color='#3186cc',
        fill_opacity=0.7,
        parse_html=False).add_to(map_barcelona)  

map_barcelona

## Methodology section 
It represents the main component of the report where you discuss and describe any exploratory data analysis that you did, any inferential statistical testing that you performed, if any, and what machine learnings were used and why.


### Define Foursquare Credentials and Version¶
Next, we are going to utilize the Foursquare API to explore the neighborhoods and segment them.

In [27]:
CLIENT_ID = '4RERZM5X0OFLE2UCKIXI0KJKFF4Q3MM2AO02Y45BAZNARUIN' # your Foursquare ID
CLIENT_SECRET = 'M2TBIVRFVYOBXHWE2TY2YNM1T5XQCJWDRFSTEQKVA3LFVT4N' # your Foursquare Secret
VERSION = '20180605' # Foursquare API version

print('Your credentails:')
print('CLIENT_ID: ' + CLIENT_ID)
print('CLIENT_SECRET:' + CLIENT_SECRET)

Your credentails:
CLIENT_ID: 4RERZM5X0OFLE2UCKIXI0KJKFF4Q3MM2AO02Y45BAZNARUIN
CLIENT_SECRET:M2TBIVRFVYOBXHWE2TY2YNM1T5XQCJWDRFSTEQKVA3LFVT4N


## Exploration of venues in Barcelona's neighbourhoods 

First, let's add the target neighbourhood in the list of neighbourhoods of Barcelona, so that we can explore venues of Barcelona's and target neighbourhood in one go.

In [55]:
df=pd.DataFrame([['Target Neighbourhood', address2,latitude2,longitude2]])



Unnamed: 0,0,1,2,3,Borough,Latitude,Longitude,Neighborhood
0,,,,,Ciutat Vella,41.379889,2.189361,La Barceloneta
1,,,,,Ciutat Vella,41.382778,2.176944,El Gotic
2,,,,,Ciutat Vella,41.379722,2.168056,El Raval
3,,,,,Ciutat Vella,41.384608,2.182717,"Sant Pere, Santa Caterina i la Ribera"
4,,,,,Eixample,41.39,2.155,L'Antiga Esquerra de l'Eixample
5,,,,,Eixample,41.383389,2.149,La Nova Esquerra de l'Eixample
6,,,,,Eixample,41.395278,2.166667,Dreta de l'Eixample
7,,,,,Eixample,41.395675,2.183703,Fort Pienc
8,,,,,Eixample,41.403561,2.174347,Sagrada Família
9,,,,,Eixample,41.377778,2.161111,Sant Antoni


Then, let's create a function to collect venues in all the neighborhoods in Barcelona

In [28]:
def getNearbyVenues(names, latitudes, longitudes, radius=500):
    
    venues_list=[]
    for name, lat, lng in zip(names, latitudes, longitudes):
        print(name)
            
        # create the API request URL
        url = 'https://api.foursquare.com/v2/venues/explore?&client_id={}&client_secret={}&v={}&ll={},{}&radius={}&limit={}'.format(
            CLIENT_ID, 
            CLIENT_SECRET, 
            VERSION, 
            lat, 
            lng, 
            radius, 
            LIMIT)
            
        # make the GET request
        results = requests.get(url).json()["response"]['groups'][0]['items']
        
        # return only relevant information for each nearby venue
        venues_list.append([(
            name, 
            lat, 
            lng, 
            v['venue']['name'], 
            v['venue']['location']['lat'], 
            v['venue']['location']['lng'],  
            v['venue']['categories'][0]['name']) for v in results])

    nearby_venues = pd.DataFrame([item for venue_list in venues_list for item in venue_list])
    nearby_venues.columns = ['Neighborhood', 
                  'Neighborhood Latitude', 
                  'Neighborhood Longitude', 
                  'Venue', 
                  'Venue Latitude', 
                  'Venue Longitude', 
                  'Venue Category']
    
    return(nearby_venues)

In [33]:
LIMIT=500
barcelona_venues = getNearbyVenues(names=data2['Neighborhood'],
                                   latitudes=data2['Latitude'],
                                   longitudes=data2['Longitude']
                                  )

La Barceloneta
El Gotic
El Raval
Sant Pere, Santa Caterina i la Ribera
L'Antiga Esquerra de l'Eixample
La Nova Esquerra de l'Eixample
Dreta de l'Eixample
Fort Pienc
Sagrada Família
Sant Antoni
La Bordeta
La Font de la Guatlla
Hostafrancs
La Marina de Port
La Marina del Prat Vermell
El Poble-sec
Sants
Sants-Badal
Montjuïc
Zona Franca – Port
Les Corts
La Maternitat i Sant Ramon
Pedralbes
El Putget i Farró
Sarrià
Sant Gervasi – la Bonanova
Sant Gervasi – Galvany
Les Tres Torres
Vallvidrera, el Tibidabo i les Planes
Vila de Gràcia
Camp d'en Grassot i Gràcia Nova
La Salut
Vallcarca i els Penitents
El Baix Guinardó
El Guinardó
Can Baró
El Carmel
La Font d'en Fargues
Horta
Montbau
La Teixonera
Vall d'Hebron
Can Peguera
Canyelles 
Ciutat Meridiana
La Guineueta
Les Roquetes
Torre Baró
La Trinitat Nova
El Turó de la Peira
Baró de Viver
Bon Pastor
El Congrés i els Indians
Navas
Sant Andreu de Palomar
La Sagrera
Trinitat Vella
El Besòs i el Maresme
El Clot
El Camp de l'Arpa del Clot
Diagonal Mar i

Let's check the size of the resulting dataframe:

In [35]:
print(barcelona_venues.shape)
barcelona_venues.head()

(2821, 7)


Unnamed: 0,Neighborhood,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
0,La Barceloneta,41.379889,2.189361,Baluard Barceloneta,41.380047,2.18925,Bakery
1,La Barceloneta,41.379889,2.189361,BRO,41.380214,2.189007,Burger Joint
2,La Barceloneta,41.379889,2.189361,La Cova Fumada,41.379254,2.189254,Tapas Restaurant
3,La Barceloneta,41.379889,2.189361,Plaça de la Barceloneta,41.379739,2.188135,Plaza
4,La Barceloneta,41.379889,2.189361,Rumbanroll,41.380597,2.187807,Mediterranean Restaurant


And let's check as well how many venues were returned for each neighbourhood.

In [36]:
barcelona_venues.groupby('Neighborhood').count()

Unnamed: 0_level_0,Neighborhood Latitude,Neighborhood Longitude,Venue,Venue Latitude,Venue Longitude,Venue Category
Neighborhood,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
Baró de Viver,4,4,4,4,4,4
Bon Pastor,5,5,5,5,5,5
Camp d'en Grassot i Gràcia Nova,46,46,46,46,46,46
Can Baró,19,19,19,19,19,19
Can Peguera,34,34,34,34,34,34
Canyelles,7,7,7,7,7,7
Ciutat Meridiana,8,8,8,8,8,8
Diagonal Mar i el Front Marítim del Poblenou,80,80,80,80,80,80
Dreta de l'Eixample,100,100,100,100,100,100
El Baix Guinardó,42,42,42,42,42,42


Let's check how many unique categories can be curated from all the returned venues:

In [37]:
print('There are {} uniques categories.'.format(len(barcelona_venues['Venue Category'].unique())))

There are 274 uniques categories.


## Exploration of venues in target neighbourhood

## 3. Analyze Each Neighborhood

## 4. Cluster Neighborhoods

## 5. Select similar neighbours, calculate distances to work and estimate m2; present in plots

## Results 
Section where you discuss the results.


## Discussion 
Section where you discuss any observations you noted and any recommendations you can make based on the results.


## Conclusion 
Section where you conclude the report.

In [8]:
import pandas as pd
import numpy as np