## Predicting Price with Size, Location, and Neighborhood

**Goal: Use all features in the dataset to improve the model for predicting the price of a house in Bueno Aires.** 

Specific Goals:

- Build a model to predict apartment price
- Evaluate all the features in the dataset for use in the model.
- Create two deployments of the trained model

Specifics:
1. Prepare Data

 A. Import: Wrangle function & list comprehension.

 B. Explore: Null values, high - and low-cardinality, leakage, multicollinearity.

 C. Split. 

2. Build model

3. Communicate Results.

 A. make_prediction function
 
 B. Interactive dashboard MVP

In [5]:
import warnings
from glob import glob

import pandas as pd
import numpy as np
import seaborn as sns
from category_encoders import OneHotEncoder
from ipywidgets import Dropdown, FloatSlider, IntSlider, interact

from sklearn.impute import  SimpleImputer
from sklearn.linear_model import  LogisticRegression, Ridge
from sklearn.metrics import mean_squared_error
from sklearn.pipeline import make_pipeline
from sklearn.utils.validation import check_is_fitted

warnings.simplefilter(action="ignore", category=FutureWarning)

#### 1. Prepare Data

##### Import


In [6]:
def wrangle(filepath):
    # Read CSV file
    df = pd.read_csv(filepath)

    # Subset data: Apartments in "Capital Federal", less than 400,000
    mask_ba = df["place_with_parent_names"].str.contains("Capital Federal")
    mask_apt = df["property_type"] == "apartment"
    mask_price = df["price_aprox_usd"] < 400_000
    df = df[mask_ba & mask_apt & mask_price]

    # Subset data: Remove outliers for "surface_covered_in_m2"
    low, high = df["surface_covered_in_m2"].quantile([0.1, 0.9])
    mask_area = df["surface_covered_in_m2"].between(low, high)
    df = df[mask_area]

    # Split "lat-lon" column
    df[["lat", "lon"]] = df["lat-lon"].str.split(",", expand=True).astype(float)
    df.drop(columns="lat-lon", inplace=True)

    # Get place name
    df["neighborhood"] = df["place_with_parent_names"].str.split("|", expand=True)[3]
    df.drop(columns="place_with_parent_names", inplace=True)

    
    return df

##### Explore

##### Split

#### 2. Building Model

##### Baseline

##### Iterate

 ##### Evaluate

#### 3. Communicate Results