# **REAL ESTATE ANALYSIS**: BUY/RENT HOUSES IN MILAN
## How to select the best opportunities according to OMI price quotations and surrounding VENUES for buying houses in Milan.
### Capstone Project - The Battle of Neighborhoods
Author: Pier Luigi Segatto, 02/02/2021 <br />
contact: pier.segatto@gmail.com

<img src="https://upload.wikimedia.org/wikipedia/commons/3/30/Wide_angle_Milan_skyline_from_Duomo_roof.jpg">

## Table of Contents

* [1. Introduction](#1.)
    * [1.1. Business Problem](#1.1.)
    * [1.2. Target Audience](#1.2.)
* [2. Data](#2.)
    * [2.1 Visualization](#2.1.)
* [3. Methodology](#3.)
    * [3.1. Requirements](#3.1.)
    * [3.2. Milan boroughs dataset](#3.2.)
    * [3.3. Milan neighborhoods visualization](#3.3.)
    * [3.4. OMI quotations housing price dataset](#3.4.)
    * [3.5. Neighborhood venues](#3.5.)
    

## 1. Introduction <a class="anchor" id="1."></a>

Milan is the second-most populous city in Italy after Rome. 
The city proper has a population of about 1.4 million while its metropolitan city has 3.26 million inhabitants ([ISTAT](http://demo.istat.it/bilmens2019gen/index.html)). Its continuously built-up urban area, that stretches well beyond the boundaries of the administrative metropolitan city, is the fourth largest in the EU with 5.27 million inhabitants. The population within the wider Milan metropolitan area, also known as Greater Milan, is estimated at 8.2 million, making it by far the largest metropolitan area in Italy and the 3rd largest in the EU ([source](http://www.old.unimib.it/open/news/Le-aree-metropolitane-in-Italia-occupano-il-9-per-cento-del-territorio/193547881368277998)). <br />
Milan is considered a leading global city, with strengths in the field of the art, commerce, design, education, entertainment, fashion, finance, healthcare, media, services, research and tourism. The city has been recognized as one of the world's four fashion capitals thanks to several international events and fairs, including Milan Fashion Week and the Milan Furniture Fair, which are currently among the world's biggest in terms of revenue, visitors and growth. It hosts numerous cultural institutions, academies and universities. <br />
Whereas Rome is Italy's political capital, Milan is the country's industrial and financial heart. In 2019 GDP per-capita of Milan is estimated at €49.000, steadily increasing, and significantly higher that the Italian average of €26.000 ([source](https://www.assolombarda.it/media/comunicati-stampa/rassegna-stampa-osservatorio-milano-2019-7-novembre-2019)). <br />
Milan is the destination of 11 million visitors in 2019 (as reported in the city website ([source](https://www.comune.milano.it/-/turismo.-nel-2019-sfiorati-11-milioni-di-visitatori)), attracted by its museums and art galleries, that include some of the most important collections in the world, like the major works by Leonardo da Vinci. The city is served by many luxury hotels and dreamy restaurants. <br />
Last but not least, Milan will host the 2026 Winter Olympics together with Cortina d'Ampezzo. <br />

## 1.1. Business Problem <a class="anchor" id="1.1."></a> 

Milan represents the epicenter for Italian life and it attracts companies, corporates, and people who move their core businesses and lives there. Due to the huge variety and heterogeneity of services and possibilities, prices for housing in Milan can be high and different among different areas of the city.

The goal of this project is to develop a tool for finding the most efficient *venue*- and *price*-wise solution for buying an house in Milan. This project will focus on finding the characteristics of each neighborhood in terms of house prices and relevant venues in the surrounding area (like restaurants, gyms, parks...). By adopting Machine learning solutions such as clustering and regression, this project will answer to the following questions: 

<font color = "black" size = "+1">1. If you want to buy or rent an house in Milan, which is the best neighborhood according to your capital, your lifestyle, and needs?</font>

<font color = "black" size = "+1">2. If you want to eat sushi and visit a museum, which neighborhood should you visit? </font>

<font color = "black" size = "+1">3. You are looking for an apartment, close to transportation station and to an italian restaurant, which neighborhood should you consider? </font>

<img src="https://traveldir.co/wp-content/uploads/2020/12/milan-info-map-of-italy-with-yellow-pin-marking-milano-centro-storico.jpg">

## 1.2. Target Audience <a class="anchor" id='1.2.'></a>

Real estates. 

Housing investors.

Privates looking for the perfect place to rent or buy a house in Milan. 

Tourists.

## 2. Data <a class='anchor' id='2.'><a/>

The data for this project has been retrieved from multiple sources, paying the utmost attention to the reliability of them. For this reason, the data was collected from:
1. [Milan borough dataset](#Borough) and [house market and rental values dataset](#Values): retrieved from the Italian Revenue Agency website ([source](https://www.agenziaentrate.gov.it/portale/schede/fabbricatiterreni/omi/banche-dati/quotazioni-immobiliari)), where the Milan borough list and the information about the market values and the rental values of the houses have been found, related to the 1st half of 2020, depending on the house location and the state of the property, and considering the negative influence brought by the COVID19 pandemic on real estate markets. <br /> In order to access to the CSV file, it's necessary to register to the website.
2. [Geo-locational information of Milan city center and the neighborhoods](#Location): using  Google Maps Geocoding API, it is possible to retrieve the geo-locational information (latitude and longitude) of Milan city center and the neighborhoods.
3. [Surrounding venues for each neighborhood](#Venues): obtained using FourSquare API platform.

These datasets allow to explore and implement ML algorithms to gain insights on Milan and inform the final user on best locations. The [Milan borough dataset](#Borough) allowed to determine the value of the house, on the basis of the borough position and the state of the property. Neighborhoods locations have been fundamental to understand the correlation between the neighborhood positions (in terms of distance from the Milan city center) and the value of the houses. These positions, together with venues data, have been essential to determinate the clusters and identify the most common venues for each of them.

## 3. Methodology <a class='anchor' id='3.'><a/>

In the following sections:
- Libraries and external packages are loaded, Milan datasets are imported, cleaned and explored.
- Neighborhoods’ location are visualized and venues are downloaded and formatted to meet the required standards.

## 3.1. Requirements <a class='anchor' id='3.1.'><a/>

The first and important step in data science is the data retrieval; indeed, there aren’t reliable and precise analysis without using the best data and the most appropriate technique and algorithms. <br/>
This analysis starts with the data collection and cleaning, in order to get all the essential data to achieve the goal of this study.

### Download Libraries

uncomment the next cell if folium or geopy are not available

In [165]:
# !conda install -c conda-forge folium=0.5.0 --yes 
# !conda install -c conda-forge geopy --yes 

Import the required libraries. 

In [166]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Optimization and machine learning libraries
from scipy.optimize import curve_fit # fitting routines
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from matplotlib.colors import LinearSegmentedColormap

# Seaborn library for visualization
import seaborn as sns

import folium # map rendering library

print('Libraries imported.')

Libraries imported.


## 3.2. Milan boroughs dataset <a class='anchor' id='3.2.'><a/>

Use the requests library to download the Milan dataset.

In [201]:
url = "https://raw.githubusercontent.com/pierluigisegatto/Data_Science_Material/main/IBM%20Data%20Science%20Professional%20Certificate/8-ML_Full_Project/Real%20estate%20Milan%20/Milan%20Neighborhood.csv"
page = requests.get(url)
if page.status_code == 200:
    print('Page download successful')
else:
    print('Page download error. Error code: {}'.format(page.status_code))

Page download successful


In [202]:
# be careful to use ';' as a separator
neighborhoods=pd.read_csv(url, sep=';')
neighborhoods.head(5)

Unnamed: 0,Area_territoriale,Regione,Prov,Comune_ISTAT,Comune_cat,Sez,Comune_amm,Comune_descrizione,Fascia,Zona_Descr,Zona,LinkZona,Cod_tip_prev,Descr_tip_prev,Stato_prev,Microzona
0,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,"'CENTRO STORICO -DUOMO, SANBABILA, MONTENAPOLE...",B12,MI00003228,20,Abitazioni civili,N,2
1,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,'CENTRO STORICO -UNIVERSITA STATALE',B13,MI00003232,20,Abitazioni civili,N,3
2,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,'CENTRO STORICO - BRERA',B15,MI00003544,20,Abitazioni civili,O,0
3,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,"'CENTRO STORICO -SANT`AMBROGIO, CADORNA, VIA D...",B16,MI00003545,20,Abitazioni civili,O,0
4,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,"'PARCO SEMPIONE, ARCO DELLA PACE'",B17,MI00004767,20,Abitazioni civili,N,0


In [203]:
# understand a bit the dataset
print(f'Minal has {neighborhoods.shape[0]} Boroughs')

Minal has 38 Boroughs


In [204]:
# check for unique values 
print(neighborhoods['Descr_tip_prev'].unique())
print(neighborhoods['Stato_prev'].unique())
print(neighborhoods['Microzona'].unique())

['Abitazioni civili' 'Non presente']
['N' 'O' nan]
[ 2  3  0 22 32 34 35 37 40 42 43 47 50]


In [205]:
# where the anomaly occur? 
neighborhoods.loc[(neighborhoods['Descr_tip_prev']== 'Non presente') | (neighborhoods['Stato_prev']== np.nan),'Zona_Descr']

37    'RONCHETTO, CHIARAVALLE, RIPAMONTI'
Name: Zona_Descr, dtype: object


Filter only needed columns, i.e. keep only the neighborhood description name the zone code. Translate column names and clean the neighborhoods names from quotes.

Get the actual Neighborhood names by splitting each descriptive name (Boro) where comma occurs. This step is needed to retrieve the latitude and longitude of each neighborhoods.

In [217]:
# keep useful cols
milan = neighborhoods[['Zona_Descr','Zona','Fascia']].copy() 
# translate col names
milan.rename(columns={'Zona_Descr':'Neighborhoods',"Zona":'Code','Fascia':'Area'},inplace=True)
# Get rid of quotes in names
milan['Neighborhoods'].replace("'", '',regex=True,inplace=True)
# explode the dataset to get the actual neighborhood names
milan = milan.set_index(['Code','Area']).apply(lambda x: x.str.split(',').explode()).reset_index()
milan.head(5)

Unnamed: 0,Code,Area,Neighborhoods
0,B12,B,CENTRO STORICO -DUOMO
1,B12,B,SANBABILA
2,B12,B,MONTENAPOLEONE
3,B12,B,MISSORI
4,B12,B,CAIROLI


In [218]:
print(f'Milan has {milan.shape[0]} Neighborhoods')

Milan has 104 Neighborhoods


Some neighborhood names cannot be recognized by Google Maps Geocoding API. It's essential to edit them, to ensure the success of the last step (latitudes and longitudes retrive)

In [219]:
milan['Neighborhoods'].replace("SANBABILA", "SAN BABILA", regex=True, inplace=True)
milan['Neighborhoods'].replace("FAENZA", "VIALE FAENZA", regex=True, inplace=True)
milan['Neighborhoods'].replace("P.ROSSI", "VIA PELLEGRINO ROSSI", regex=True, inplace=True)
milan['Neighborhoods'].replace("CORSO VENEZIA", "PORTA VENEZIA", regex=True, inplace=True)
milan['Neighborhoods'].replace("UNIVERSITA STATALE", "CITTÀ STUDI", regex=True, inplace=True)
milan['Neighborhoods'].replace("STAZIONE CENTRALE VIALE STELVIO", "STAZIONE CENTRALE", regex=True, inplace=True)
milan['Neighborhoods'].replace("C.NA MERLATA", "CASCINA MERLATA", regex=True, inplace=True)
milan['Neighborhoods'].replace("MONZA", "VIALE MONZA", regex=True, inplace=True)
milan['Neighborhoods'].replace("BUENOS AIRES", "CORSO BUENOS AIRES", regex=True, inplace=True)
milan['Neighborhoods'].replace("TITO LIVIO", "VIA TITO LIVIO", regex=True, inplace=True)
milan['Neighborhoods'].replace("MAROCCHETTI", "VIA CARLO MAROCHETTI", regex=True, inplace=True)
milan['Neighborhoods'].replace("REGINA GIOVANNA", "VIALE REGINA GIOVANNA", regex=True, inplace=True)
milan['Neighborhoods'].replace("ASCANIO SFORZA", "VIA ASCANIO SFORZA", regex=True, inplace=True)
milan['Neighborhoods'].replace("Q. ROMANO", "QUINTO ROMANO", regex=True, inplace=True)

milan['Neighborhoods'] = milan['Neighborhoods'].str.replace('CENTRO STORICO -', '')


In [220]:
# remove all white spaces at beginning of each neigh name
milan['Neighborhoods'] = milan['Neighborhoods'].str.lstrip(' ')

In [221]:
neigh_df = milan[['Neighborhoods']]
# neigh_df = neigh_df.drop_duplicates(keep='first')
neigh_df.shape[0]
# porta romana occurs 2 times one with code B19 and one with code B20
print(f'Milan has {neigh_df.shape[0]} unique Neighborhood names')

Milan has 104 unique Neighborhood names


### Retrive the geo-locational information 

In [222]:
from geopy.exc import GeocoderTimedOut
from geopy.exc import GeocoderNotFound

address= (neigh_df['Neighborhoods'] + ', Milano, MI , Italia')
geolocater= Nominatim(user_agent="milan_coordinates-explorer")
location=[]
empty=[]

def getcoords(add):
    try:
        coords= geolocater.geocode(add, timeout=10)
        location.append([add, coords.latitude, coords.longitude])
        print("the coords are {}".format(location[-1]))
    
    except GeocoderTimedOut:
        return getcoords(add)
    
    except:
        empty.append([add])
        print("Couldn't find coords of {}".format(empty[-1]))
        
for add in address:
        getcoords(add)

the coords are ['DUOMO, Milano, MI , Italia', 45.4645848, 9.1896695]
the coords are ['SAN BABILA, Milano, MI , Italia', 45.4665214, 9.1975286]
the coords are ['MONTENAPOLEONE, Milano, MI , Italia', 45.470015, 9.1928678]
the coords are ['MISSORI, Milano, MI , Italia', 45.4598278, 9.1895549]
the coords are ['CAIROLI, Milano, MI , Italia', 45.4687012, 9.1816966]
the coords are ['CITTÀ STUDI, Milano, MI , Italia', 45.4770557, 9.2265746]
the coords are ['BRERA, Milano, MI , Italia', 45.47347885, 9.188407990372653]
the coords are ['SANT`AMBROGIO, Milano, MI , Italia', 45.4613906, 9.1729167]
the coords are ['CADORNA, Milano, MI , Italia', 45.4681551, 9.1771024]
the coords are ['VIA DANTE, Milano, MI , Italia', 45.4663326, 9.1847796]
the coords are ['PARCO SEMPIONE, Milano, MI , Italia', 45.47301905, 9.176969268773153]
the coords are ['ARCO DELLA PACE, Milano, MI , Italia', 45.47569195, 9.172427802834267]
the coords are ['TURATI, Milano, MI , Italia', 45.475039, 9.1947243]
the coords are ['MOS

### Save found neighborhood coordinates 

In [223]:
milan['Area'].shape
neighborhoods_coordinates.shape

(103, 4)

In [224]:
neighborhoods_coordinates = pd.DataFrame(location, columns=['Neighborhoods','Latitude','Longitude'])
neighborhoods_coordinates['Neighborhoods'].replace(", Milano, MI , Italia", "", regex=True, inplace=True)
neighborhoods_coordinates['Area'] = milan['Area']
neighborhoods_coordinates.to_csv('coordinates.csv')

neighborhoods_coordinates.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude,Area
0,DUOMO,45.464585,9.18967,B
1,SAN BABILA,45.466521,9.197529,B
2,MONTENAPOLEONE,45.470015,9.192868,B
3,MISSORI,45.459828,9.189555,B
4,CAIROLI,45.468701,9.181697,B


## 3.3 Milan neighborhoods visualization <a class='anchor' id='3.3.'><a/>

To get a sense to the study, it is of primary importance to know the precise location of each neighborhoods. For this reason, it's essential to create a map of Milan, in which all the neighborhood positions are shown. To be more exhaustive, all the Milan areas (B, C, D and E) are differentiated by different colors. <br />
Therefore, the map of Milan neighborhoods has been plotted using the Folium library.

### Retrieve Milan coordinates

In [225]:
Milan_address='Milan, Italy'
geolocater= Nominatim(user_agent="Milan_search")
center= geolocater.geocode(Milan_address)
lat= center.latitude
lon= center.longitude
print('The geograpical coordinate of {} are {}, {}.'.format(Milan_address,lat, lon))

The geograpical coordinate of Milan, Italy are 45.4668, 9.1905.


### Create the map

In [293]:
neighborhoods_coordinates['Area'] = milan['Code'].str[:1]
neighborhoods_coordinates['Code'] = milan['Code']
neighborhoods_coordinates.head()


Unnamed: 0,Neighborhoods,Latitude,Longitude,Area,Code
0,DUOMO,45.464585,9.18967,B,B12
1,SAN BABILA,45.466521,9.197529,B,B12
2,MONTENAPOLEONE,45.470015,9.192868,B,B12
3,MISSORI,45.459828,9.189555,B,B12
4,CAIROLI,45.468701,9.181697,B,B12


In [227]:
Milan_map=folium.Map(location=[lat,lon],zoom_start=11)

def color(letter): 
    if letter == 'B': 
        col = 'blue'
    elif letter == 'C': 
        col = 'green'
    elif letter == 'D': 
        col = 'purple'
    else: 
        col='red'
    return col 

for lat, long, nieghborhood, letter in zip(neighborhoods_coordinates['Latitude'], neighborhoods_coordinates['Longitude'], neighborhoods_coordinates['Neighborhoods'], neighborhoods_coordinates['Area']):
    label=folium.Popup(nieghborhood, parse_html=True)
    folium.CircleMarker(
        [lat, long],
        radius=2,
        popup=label,
        color= color(letter),
        fill=True,
        fill_color='#3199cc',
        fill_opacity=0.3,
        parse_html=False, legend_name='SCALE').add_to(Milan_map)
    
legend_html = '''
    <div style="position: fixed;  
        bottom: 10px; left: 10px; width: 100px; height: 120px; 
        border:2px solid grey; z-index:9999; font-size:11px;">&nbsp; Legend <br>
        &nbsp; <b> Area B </b> &nbsp; <i class="fa fa-map-marker fa-2x" style="color:blue"></i><br>
        &nbsp; <b> Area C </b> &nbsp; <i class="fa fa-map-marker fa-2x" style="color:green"></i><br>
        &nbsp; <b> Area D </b> &nbsp; <i class="fa fa-map-marker fa-2x" style="color:purple"></i><br>
        &nbsp; <b> Area E </b> &nbsp; <i class="fa fa-map-marker fa-2x" style="color:red"></i><br>
    </div>
    '''
Milan_map.get_root().html.add_child(folium.Element(legend_html))

Milan_map

## 3.4 OMI quotations housing price dataset <a class='anchor' id='3.4.'><a/>

In [276]:
url = "https://raw.githubusercontent.com/pierluigisegatto/Data_Science_Material/main/IBM%20Data%20Science%20Professional%20Certificate/8-ML_Full_Project/Real%20estate%20Milan%20/Market%20values%20of%20house%20in%20Milan.csv"
page = requests.get(url)
if page.status_code == 200:
    print('Page download successful')
else:
    print('Page download error. Error code: {}'.format(page.status_code))

Page download successful


In [277]:
values=pd.read_csv(url, sep=';')
values.head(10)

Unnamed: 0,Area_territoriale,Regione,Prov,Comune_ISTAT,Comune_cat,Sez,Comune_amm,Comune_descrizione,Fascia,Zona,LinkZona,Cod_Tip,Descr_Tipologia,Stato,Stato_prev,Compr_min,Compr_max,Sup_NL_compr,Loc_min,Loc_max,Sup_NL_loc
0,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,20,Abitazioni civili,OTTIMO,,9000,12300,L,28,375,L
1,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,20,Abitazioni civili,NORMALE,P,7400,9000,L,23,28,L
2,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,21,Abitazioni di tipo economico,NORMALE,P,6500,7800,L,18,22,L
3,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,21,Abitazioni di tipo economico,OTTIMO,,7800,9000,L,24,30,L
4,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,19,Abitazioni signorili,OTTIMO,P,11200,14300,L,375,46,L
5,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,13,Box,NORMALE,P,4700,6600,L,145,215,L
6,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,9,Magazzini,NORMALE,P,2550,3300,L,14,18,L
7,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,5,Negozi,OTTIMO,,15100,21600,L,83,124,L
8,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,5,Negozi,NORMALE,P,8800,12400,L,40,59,L
9,NORD-OVEST,LOMBARDIA,MI,3015146,C1AA,,F205,MILANO,B,B12,MI00003228,6,Uffici,OTTIMO,P,5800,9000,L,213,36,L


This dataset presents for each zone and for each housing type and status (if existing) combination the average observed price. 


In [278]:
values.shape

(484, 21)

In [280]:
# check the housing types available
values['Descr_Tipologia'].unique()

array(['Abitazioni civili', 'Abitazioni di tipo economico',
       'Abitazioni signorili', 'Box', 'Magazzini', 'Negozi', 'Uffici',
       'Uffici strutturati', 'Laboratori', 'Capannoni industriali',
       'Capannoni tipici', 'Ville e Villini'], dtype=object)

In order to merge this dataset with the neighborhoods I am interested in collecting the zone code along with the housing description and prices.

In [287]:
columns = ['Zona','Descr_Tipologia','Stato','Compr_min','Compr_max','Loc_min','Loc_max']
omi = values[columns].copy()
omi.head()

Unnamed: 0,Zona,Descr_Tipologia,Stato,Compr_min,Compr_max,Loc_min,Loc_max
0,B12,Abitazioni civili,OTTIMO,9000,12300,28,375
1,B12,Abitazioni civili,NORMALE,7400,9000,23,28
2,B12,Abitazioni di tipo economico,NORMALE,6500,7800,18,22
3,B12,Abitazioni di tipo economico,OTTIMO,7800,9000,24,30
4,B12,Abitazioni signorili,OTTIMO,11200,14300,375,46


**N.B:** Since I am interested in analyzing houses, I drop all occurrences of non living places. 

In [288]:
# translate column headers
omi.rename(columns={'Zona' : 'Code', 'Descr_Tipologia' : 'Housing_type', 'Stato' : 'Condition', 'Compr_min' : 'Min_market_value (€/m2)', 'Compr_max' : 'Max_market_value (€/m2)', 'Loc_min' : 'Min_rental_value (€/m2 x month)', 'Loc_max' : 'Max_rental_value (€/m2 x month)',}, inplace=True)
print(omi.shape)

# drop non-living quotations
omi = omi[(omi['Housing_type'] == 'Abitazioni civili') | (omi['Housing_type'] == 'Abitazioni signorili')]
# Translate to english housing types, condition values
omi = omi.replace(regex={'OTTIMO': 'Excellent','NORMALE':'Normal','Abitazioni civili':'Residential houses','Abitazioni signorili':'Stately houses'})
print(omi.shape)
omi.head()

(484, 7)
(88, 7)


Unnamed: 0,Code,Housing_type,Condition,Min_market_value (€/m2),Max_market_value (€/m2),Min_rental_value (€/m2 x month),Max_rental_value (€/m2 x month)
0,B12,Residential houses,Excellent,9000,12300,28,375
1,B12,Residential houses,Normal,7400,9000,23,28
4,B12,Stately houses,Excellent,11200,14300,375,46
12,B13,Residential houses,Excellent,6900,8200,185,273
13,B13,Residential houses,Normal,5000,6900,145,185


### Adjust the data type of the dataframe 

In [289]:
omi.dtypes

Code                               object
Housing_type                       object
Condition                          object
Min_market_value (€/m2)             int64
Max_market_value (€/m2)             int64
Min_rental_value (€/m2 x month)    object
Max_rental_value (€/m2 x month)    object
dtype: object

We can note that the data type of Min_rental_value and Max_rental_value columns are not correct. For this reason we will transform the columns data type to float. 
The problem is that decimals are preceded by comma and not by dot. So first we will replace ',' with '.'.

In [290]:
columns_to_change = ['Min_rental_value (€/m2 x month)','Max_rental_value (€/m2 x month)']
omi[columns_to_change]=omi[columns_to_change].replace(regex={',':'.'})

omi[columns_to_change] = omi[columns_to_change].astype('float')

omi.head()

Unnamed: 0,Code,Housing_type,Condition,Min_market_value (€/m2),Max_market_value (€/m2),Min_rental_value (€/m2 x month),Max_rental_value (€/m2 x month)
0,B12,Residential houses,Excellent,9000,12300,28.0,37.5
1,B12,Residential houses,Normal,7400,9000,23.0,28.0
4,B12,Stately houses,Excellent,11200,14300,37.5,46.0
12,B13,Residential houses,Excellent,6900,8200,18.5,27.3
13,B13,Residential houses,Normal,5000,6900,14.5,18.5


In [291]:
omi.dtypes

Code                                object
Housing_type                        object
Condition                           object
Min_market_value (€/m2)              int64
Max_market_value (€/m2)              int64
Min_rental_value (€/m2 x month)    float64
Max_rental_value (€/m2 x month)    float64
dtype: object

### Merge the neighborhoods_coordinates and omi dataframes on borough codes

In [297]:
full_df = pd.merge(neighborhoods_coordinates, omi, on='Code')
full_df.head()

Unnamed: 0,Neighborhoods,Latitude,Longitude,Area,Code,Housing_type,Condition,Min_market_value (€/m2),Max_market_value (€/m2),Min_rental_value (€/m2 x month),Max_rental_value (€/m2 x month)
0,DUOMO,45.464585,9.18967,B,B12,Residential houses,Excellent,9000,12300,28.0,37.5
1,DUOMO,45.464585,9.18967,B,B12,Residential houses,Normal,7400,9000,23.0,28.0
2,DUOMO,45.464585,9.18967,B,B12,Stately houses,Excellent,11200,14300,37.5,46.0
3,SAN BABILA,45.466521,9.197529,B,B12,Residential houses,Excellent,9000,12300,28.0,37.5
4,SAN BABILA,45.466521,9.197529,B,B12,Residential houses,Normal,7400,9000,23.0,28.0


In [298]:
full_df.shape

(221, 11)

## 3.5 Neighborhood venues <a class='anchor' id='3.5.'><a/>