<font color = "black" size = "+8">**REAL ESTATE ANALYSIS**: BUY/RENT HOUSES IN MILAN.</font>
# How to select the best opportunities according to _OMI_ prices and _VENUES_ for buying houses in Milan.
### Capstone Project - The Battle of Neighborhoods
Author: Pier Luigi Segatto, 02/02/2021 <br />
contact: pier.segatto@gmail.com

<img src="https://upload.wikimedia.org/wikipedia/commons/3/30/Wide_angle_Milan_skyline_from_Duomo_roof.jpg">

## Table of Contents

* [1. Introduction](#1.)
    * [1.1. Business Problem](#1.1.)
    * [1.2. Target Audience](#1.2.)
* [2. Data](#2.)
    * [2.1 Visualization](#2.1.)
* [3. Methodology](#3.)
    * [3.1. Requirements, Data Loading and Visualization](#3.1.)

## 1. Introduction <a class="anchor" id="1."></a>

Milan is the second-most populous city in Italy after Rome. 
The city proper has a population of about 1.4 million while its metropolitan city has 3.26 million inhabitants ([ISTAT](http://demo.istat.it/bilmens2019gen/index.html)). Its continuously built-up urban area, that stretches well beyond the boundaries of the administrative metropolitan city, is the fourth largest in the EU with 5.27 million inhabitants. The population within the wider Milan metropolitan area, also known as Greater Milan, is estimated at 8.2 million, making it by far the largest metropolitan area in Italy and the 3rd largest in the EU ([source](http://www.old.unimib.it/open/news/Le-aree-metropolitane-in-Italia-occupano-il-9-per-cento-del-territorio/193547881368277998)). <br />
Milan is considered a leading global city, with strengths in the field of the art, commerce, design, education, entertainment, fashion, finance, healthcare, media, services, research and tourism. The city has been recognized as one of the world's four fashion capitals thanks to several international events and fairs, including Milan Fashion Week and the Milan Furniture Fair, which are currently among the world's biggest in terms of revenue, visitors and growth. It hosts numerous cultural institutions, academies and universities. <br />
Whereas Rome is Italy's political capital, Milan is the country's industrial and financial heart. In 2019 GDP per-capita of Milan is estimated at €49.000, steadily increasing, and significantly higher that the Italian average of €26.000 ([source](https://www.assolombarda.it/media/comunicati-stampa/rassegna-stampa-osservatorio-milano-2019-7-novembre-2019)). <br />
Milan is the destination of 11 million visitors in 2019 (as reported in the city website ([source](https://www.comune.milano.it/-/turismo.-nel-2019-sfiorati-11-milioni-di-visitatori)), attracted by its museums and art galleries, that include some of the most important collections in the world, like the major works by Leonardo da Vinci. The city is served by many luxury hotels and dreamy restaurants. <br />
Last but not least, Milan will host the 2026 Winter Olympics together with Cortina d'Ampezzo. <br />

## 1.1. Business Problem <a class="anchor" id="1.1."></a> 

Milan represents the epicenter for Italian life and it attracts companies, corporates, and people who move their core businesses and lives there. Due to the huge variety and heterogeneity of services and possibilities, prices for housing in Milan can be high and different among different areas of the city.

The goal of this project is to develop a tool for finding the most efficient *venue*- and *price*-wise solution for buying an house in Milan. This project will focus on finding the characteristics of each neighborhood in terms of house prices and relevant venues in the surrounding area (like restaurants, gyms, parks...). By adopting Machine learning solutions such as clustering and regression, this project will answer to the following questions: 

<font color = "black" size = "+1">If you want to buy or rent an house in Milan, which is the best neighborhood according to your capital, your lifestyle, and needs?</font>

<font color = "black" size = "+1">If you want to eat sushi and visit a museum, which neighborhood should you visit? </font>

<font color = "black" size = "+1">You are looking for an apartment, close to transportation station and to an italian restaurant, which neighborhood should you consider? </font>

<img src="https://traveldir.co/wp-content/uploads/2020/12/milan-info-map-of-italy-with-yellow-pin-marking-milano-centro-storico.jpg">

## 1.2. Target Audience <a class="anchor" id='1.2.'></a>

Real estates. 

Housing investors.

Privates looking for the perfect place to rent or buy a house in Milan. 

Tourists.

## 2. Data <a class='anchor' id='2.'><a/>

The data for this project has been retrieved from multiple sources, paying the utmost attention to the reliability of them. For this reason, the data was collected from:
1. [Milan borough dataset](#Borough) and [house market and rental values dataset](#Values): retrieved from the Italian Revenue Agency website ([source](https://www.agenziaentrate.gov.it/portale/schede/fabbricatiterreni/omi/banche-dati/quotazioni-immobiliari)), where the Milan borough list and the information about the market values and the rental values of the houses have been found, related to the 1st half of 2020, depending on the house location and the state of the property, and considering the negative influence brought by the COVID19 pandemic on real estate markets. <br /> In order to access to the CSV file, it's necessary to register to the website.
2. [Geo-locational information of Milan city center and the neighborhoods](#Location): thanks to Google Maps Geocoding API, it was possible to retrieve the geo-locational information (latitude and longitude) of Milan city center and the neighborhoods.
3. [Surrounding venues for each neighborhood](#Venues): obtained using FourSquare API platform.

These datasets allow to explore and implement ML algorithms to gain insights on Milan and inform the final user on best locations. The [Milan borough dataset](#Borough) allowed to determine the value of the house, on the basis of the borough position and the state of the property. Neighborhoods locations have been fundamental to understand the correlation between the neighborhood positions (in terms of distance from the Milan city center) and the value of the houses. These positions, together with venues data, have been essential to determinate the clusters and identify the most common venues for each of them.

## 3. Methodology <a class='anchor' id='3.'><a/>

In the following sections:
- Libraries and external packages are loaded, Milan datasets are imported, cleaned and explored.
- Neighborhoods’ location are visualized and venues are downloaded and formatted to meet the required standards.

## 3.1. Requirements, Data Loading and Visualization <a class='anchor' id='3.1.'><a/>

The first and important step in data science is the data retrieval; indeed, there aren’t reliable and precise analysis without using the best data and the most appropriate technique and algorithms. <br/>
This analysis starts with the data collection and cleaning, in order to get all the essential data to achieve the goal of this study.

### Download Libraries

uncomment the next cell if folium or geopy are not available

In [8]:
# !conda install -c conda-forge folium=0.5.0 --yes 
# !conda install -c conda-forge geopy --yes 

Import the required libraries. 

In [10]:
import numpy as np # library to handle data in a vectorized manner

import pandas as pd # library for data analsysis
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)

# Optimization and machine learning libraries
from scipy.optimize import curve_fit # fitting routines
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

from geopy.geocoders import Nominatim # convert an address into latitude and longitude values

import requests # library to handle requests

# Matplotlib and associated plotting modules
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
from matplotlib.colors import LinearSegmentedColormap

# Seaborn library for visualization
import seaborn as sns

import folium # map rendering library

print('Libraries imported.')

Libraries imported.
