# <center> Data Science Capstone
    
## <center> Predicting Buenos Aires House Prices with Linear Regression

![alt text](bsas.jpg)
<center> Figure 1. Buenos Aires, Argentina.

### 1. Description and Discussion of the Background

As a part of [IBM Data Science Professional Certificate Course](https://www.coursera.org/professional-certificates/ibm-data-science), I decided to develop a model for predicting the monetary value of a house located at the Buenos Aires's area, where I am currently staying. To do that, I will focus on a simple but still fundamental model: Linear Regression. 
Using Foursquare API [1] I will obtain the most common venues of given real estates of Buenos Aires, which will enrich the information of the houses/departments for the final moment of predicting the price of them. A model like this would be very valuable for a real state agent who could make use of the information provided in a dayly basis.

The data is provided by Properati [2], which is a Latin American property search site. On its [web page](https://www.properati.com.ar/?utm_source=properati.com&utm_medium=organic&utm_campaign=redir-from-ip), you can find links to different tools and datasets to use freely in your projects. 

#### 1.1. Business problem

The problem posed for this Project is the development of a regression model that allows predicting the price of a real property of the Buenos Aires city. To solve the problem, the following objectives are established:

- Data acquisition.

- Exploratory analysis and dataset cleaning.

- Construction of a linear regression model to make predictions about the price of properties.


#### 1.2. Interest

As a first interest, this Project could allow the company Properati to have an automatic appraiser of real estate for the properties marketed by it.


## 2. Data

#### 2.1 Data Source

The Properati real estate company provides us with a dataset of its internal databases corresponding to the first half of 2017. 

(Link for download: https://drive.google.com/file/d/0BzVrTKc02N8qNUdDSExBQlFTNlU/view)

The dataset provided is a 145 MB file with a .csv format (with column delimiter by "comma" and character encoding according to the system 'utf-8-sig').

In [2]:
#dependencies
import pandas as pd

#load data
df = pd.read_csv("properatti.csv", sep=',', encoding='utf-8-sig')

#preliminar data visualization
df.head()

Unnamed: 0.1,Unnamed: 0,operation,property_type,place_name,place_with_parent_names,country_name,state_name,geonames_id,lat-lon,lat,...,surface_covered_in_m2,price_usd_per_m2,price_per_m2,floor,rooms,expenses,properati_url,description,title,image_thumbnail
0,0,sell,PH,Mataderos,|Argentina|Capital Federal|Mataderos|,Argentina,Capital Federal,3430787.0,"-34.6618237,-58.5088387",-34.661824,...,40.0,1127.272727,1550.0,,,,http://www.properati.com.ar/15bo8_venta_ph_mat...,"2 AMBIENTES TIPO CASA PLANTA BAJA POR PASILLO,...",2 AMB TIPO CASA SIN EXPENSAS EN PB,https://thumbs4.properati.com/8/BluUYiHJLhgIIK...
1,1,sell,apartment,La Plata,|Argentina|Bs.As. G.B.A. Zona Sur|La Plata|,Argentina,Bs.As. G.B.A. Zona Sur,3432039.0,"-34.9038831,-57.9643295",-34.903883,...,,,,,,,http://www.properati.com.ar/15bob_venta_depart...,Venta de departamento en décimo piso al frente...,VENTA Depto 2 dorm. a estrenar 7 e/ 36 y 37 ...,https://thumbs4.properati.com/7/ikpVBu2ztHA7jv...
2,2,sell,apartment,Mataderos,|Argentina|Capital Federal|Mataderos|,Argentina,Capital Federal,3430787.0,"-34.6522615,-58.5229825",-34.652262,...,55.0,1309.090909,1309.090909,,,,http://www.properati.com.ar/15bod_venta_depart...,2 AMBIENTES 3ER PISO LATERAL LIVING COMEDOR AM...,2 AMB 3ER PISO CON ASCENSOR APTO CREDITO,https://thumbs4.properati.com/5/SXKr34F_IwG3W_...
3,3,sell,PH,Liniers,|Argentina|Capital Federal|Liniers|,Argentina,Capital Federal,3431333.0,"-34.6477969,-58.5164244",-34.647797,...,,,,,,,http://www.properati.com.ar/15boh_venta_ph_lin...,PH 3 ambientes con patio. Hay 3 deptos en lote...,PH 3 amb. cfte. reciclado,https://thumbs4.properati.com/3/DgIfX-85Mog5SP...
4,4,sell,apartment,Centro,|Argentina|Buenos Aires Costa Atlántica|Mar de...,Argentina,Buenos Aires Costa Atlántica,3435548.0,"-38.0026256,-57.5494468",-38.002626,...,35.0,1828.571429,1828.571429,,,,http://www.properati.com.ar/15bok_venta_depart...,DEPARTAMENTO CON FANTÁSTICA ILUMINACIÓN NATURA...,DEPTO 2 AMB AL CONTRAFRENTE ZONA CENTRO/PLAZA ...,https://thumbs4.properati.com/5/xrRqlNcSI_vs-f...


#### 2.2 Data description¶

The information of each property includes the following data (fields):

- Unnamed (Index)
- operation (Operation type: sell, rent) 
- property_type (Property type: house, apartment, PH) 
- place_name (Name of the neighborhood) 
- place_with_parent_names  
- country_name 
- state_name (Name of the state/city/borough) 
- geonames_id (geographic ID) 
- lat-lon (Latitude and Longitude)
- lat (Latitude)
- lon (Longitude) 
- price (Price of the real state)  
- currency (Unit of the monetary value: ARS, USD, PEN, UYU) 
- price_aprox_local_currency
- price_aprox_usd
- surface_total_in_m2
- surface_covered_in_m2
- price_usd_per_m2
- price_per_m2
- floor
- rooms
- expenses
- properati_url
- description
- title
- image_thumbnail

The dataset contains a total of 26 fields/columns and 121200 records. However, as it already was declarated, for this project I will focus only on the city of Buenos Aires, which result in 32316 records.

In [3]:
# filtering buenos aires city
df = df[df['state_name']=='Capital Federal'] 

## References

[1] Foursquare API

[2] Porperati page

