## Predicting Price with Location

**Project Goal:  Building from the work done on ```0-price-and-size.ipynb```, constructing a more complext ```wrangle``` function, use it to clean more data, and build a model that considers more features when predicting apartment price.** 

In [1]:
import warnings

import numpy as np
import pandas as pd
import plotly.express as px
import plotly.graph_objects as go 
from sklearn.impute import SimpleImputer
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error
from sklearn.pipeline import Pipeline, make_pipeline
from sklearn.utils.validation import check_is_fitted

warnings.simplefilter(action="ignore", category=FutureWarning)

### 1. Prepare Data

#### Import

In [15]:
def wrangle(filepath):
    # Read csv file
    df = pd.read_csv(filepath)

    # subset data: Apartments in 'Capital Federal', less than 400, 000
    mask_ba = df["place_with_parent_names"].str.contains("Capital Federal")
    mask_apt = df["property_type"] == "apartment"
    mask_price = df["price_aprox_usd"] < 400_000

    # subset data: Remove outliers for "surface_covered_in_m2"
    low, high = df["surface_covered_in_m2"].quantile([0.1, 0.9])
    mask_area = df["surface_covered_in_m2"].between(low, high)

    df = df[mask_ba & mask_apt & mask_price & mask_area]

    return df


# Import the data with the wrangle function.
frame1 = wrangle("data/buenos-aires-real-estate-1.csv")
print(frame1.info())
frame1.head()

<class 'pandas.core.frame.DataFrame'>
Index: 1413 entries, 0 to 8604
Data columns (total 16 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   operation                   1413 non-null   object 
 1   property_type               1413 non-null   object 
 2   place_with_parent_names     1413 non-null   object 
 3   lat-lon                     1369 non-null   object 
 4   price                       1413 non-null   float64
 5   currency                    1413 non-null   object 
 6   price_aprox_local_currency  1413 non-null   float64
 7   price_aprox_usd             1413 non-null   float64
 8   surface_total_in_m2         1008 non-null   float64
 9   surface_covered_in_m2       1413 non-null   float64
 10  price_usd_per_m2            969 non-null    float64
 11  price_per_m2                1413 non-null   float64
 12  floor                       400 non-null    float64
 13  rooms                       1151 non-n

Unnamed: 0,operation,property_type,place_with_parent_names,lat-lon,price,currency,price_aprox_local_currency,price_aprox_usd,surface_total_in_m2,surface_covered_in_m2,price_usd_per_m2,price_per_m2,floor,rooms,expenses,properati_url
0,sell,apartment,|Argentina|Capital Federal|Villa Crespo|,"-34.6047834183,-58.4586812499",180000.0,USD,2729232.0,180000.0,120.0,110.0,1500.0,1636.363636,,4.0,,http://villa-crespo.properati.com.ar/12egq_ven...
4,sell,apartment,|Argentina|Capital Federal|Chacarita|,"-34.5846508988,-58.4546932614",129000.0,USD,1955949.6,129000.0,76.0,70.0,1697.368421,1842.857143,,,,http://chacarita.properati.com.ar/10qlv_venta_...
9,sell,apartment,|Argentina|Capital Federal|Villa Luro|,"-34.6389789,-58.500115",87000.0,USD,1319128.8,87000.0,48.0,42.0,1812.5,2071.428571,,,,http://villa-luro.properati.com.ar/12m82_venta...
29,sell,apartment,|Argentina|Capital Federal|Caballito|,"-34.615847,-58.459957",118000.0,USD,1789163.2,118000.0,,54.0,,2185.185185,,2.0,,http://caballito.properati.com.ar/11wqh_venta_...
40,sell,apartment,|Argentina|Capital Federal|Constitución|,"-34.6252219,-58.3823825",57000.0,USD,864256.8,57000.0,42.0,42.0,1357.142857,1357.142857,5.0,2.0,364.0,http://constitucion.properati.com.ar/k2f0_vent...


The model is going to consider apartment location, specifically, latitude and longitude. From ```info```, location information s a single column where the data type is object (pandas term for str). To build our model, we need latityude and longitude to be in their own column where the data type is float.

##### Split "Lat-Lon" Column

The next step is to modify the ```wrangle``` function so that, in the DataFrame it returns the ```"lat-lon"```  column replaced by seperate ```lat``` and ```lon``` columns and drop the ```lat-lon``` column. 

In [23]:
def wrangle(filepath):
    # Read csv file
    df = pd.read_csv(filepath)

    # subset data: Apartments in 'Capital Federal', less than 400, 000
    mask_ba = df["place_with_parent_names"].str.contains("Capital Federal")
    mask_apt = df["property_type"] == "apartment"
    mask_price = df["price_aprox_usd"] < 400_000

    # subset data: Remove outliers for "surface_covered_in_m2"
    low, high = df["surface_covered_in_m2"].quantile([0.1, 0.9])
    mask_area = df["surface_covered_in_m2"].between(low, high)

    # split lat-lon column
    df[["lat", "lon"]] = df["lat-lon"].str.split(",", expand=True).astype(float)
    df.drop(columns = "lat-lon", inplace=True)

    df = df[mask_ba & mask_apt & mask_price & mask_area]

    return df


# Import the data with the wrangle function.
frame1 = wrangle("data/buenos-aires-real-estate-1.csv")
print(frame1.info())
frame1.head()

<class 'pandas.core.frame.DataFrame'>
Index: 1413 entries, 0 to 8604
Data columns (total 17 columns):
 #   Column                      Non-Null Count  Dtype  
---  ------                      --------------  -----  
 0   operation                   1413 non-null   object 
 1   property_type               1413 non-null   object 
 2   place_with_parent_names     1413 non-null   object 
 3   price                       1413 non-null   float64
 4   currency                    1413 non-null   object 
 5   price_aprox_local_currency  1413 non-null   float64
 6   price_aprox_usd             1413 non-null   float64
 7   surface_total_in_m2         1008 non-null   float64
 8   surface_covered_in_m2       1413 non-null   float64
 9   price_usd_per_m2            969 non-null    float64
 10  price_per_m2                1413 non-null   float64
 11  floor                       400 non-null    float64
 12  rooms                       1151 non-null   float64
 13  expenses                    380 non-nu

Unnamed: 0,operation,property_type,place_with_parent_names,price,currency,price_aprox_local_currency,price_aprox_usd,surface_total_in_m2,surface_covered_in_m2,price_usd_per_m2,price_per_m2,floor,rooms,expenses,properati_url,lat,lon
0,sell,apartment,|Argentina|Capital Federal|Villa Crespo|,180000.0,USD,2729232.0,180000.0,120.0,110.0,1500.0,1636.363636,,4.0,,http://villa-crespo.properati.com.ar/12egq_ven...,-34.604783,-58.458681
4,sell,apartment,|Argentina|Capital Federal|Chacarita|,129000.0,USD,1955949.6,129000.0,76.0,70.0,1697.368421,1842.857143,,,,http://chacarita.properati.com.ar/10qlv_venta_...,-34.584651,-58.454693
9,sell,apartment,|Argentina|Capital Federal|Villa Luro|,87000.0,USD,1319128.8,87000.0,48.0,42.0,1812.5,2071.428571,,,,http://villa-luro.properati.com.ar/12m82_venta...,-34.638979,-58.500115
29,sell,apartment,|Argentina|Capital Federal|Caballito|,118000.0,USD,1789163.2,118000.0,,54.0,,2185.185185,,2.0,,http://caballito.properati.com.ar/11wqh_venta_...,-34.615847,-58.459957
40,sell,apartment,|Argentina|Capital Federal|Constitución|,57000.0,USD,864256.8,57000.0,42.0,42.0,1357.142857,1357.142857,5.0,2.0,364.0,http://constitucion.properati.com.ar/k2f0_vent...,-34.625222,-58.382382


Unnamed: 0,0,1
0,-34.604783,-58.458681
4,-34.584651,-58.454693
9,-34.638979,-58.500115
29,-34.615847,-58.459957
40,-34.625222,-58.382382
...,...,...
8589,-34.631591,-58.370191
8590,-34.604555,-58.418206
8593,-34.624002,-58.390588
8601,-34.601455,-58.378132


### 2. Build Model

### 3. Communicate Results