# Home Flipping Price Model for King County, Washington, USA

![Home Flip](https://www.jaxdailyrecord.com/sites/default/files/styles/sliders_and_planned_story_image_870x580/public/196536_standard.png?itok=Gr72b8cv)

## Business Understanding

### Goal: Provide our house flipping company with **"Cookie Cutter House"** focused price model to better understand the variability in home price of King County Washington, USA

Our Stakeholder is a home flipping company. House flipping (Flipping) is the process of purchasing property in residential real estate, renovating the property and then selling for profit. 

70% Rule

The best practice rule for house flipping is the 70% model. That is the amount spent on purchasing the home and it's renovations should be no more than 70% of the after-repair-value of the home. It's therefore extremely important to know what a home is worth as the purchasing price of the home makes up a majority of the budget.

The Cookie Cutter Model

The goal in house flipping is not to set a home apart from those around it but bring homes that are underperforming price-wise up to par with the surrounding neighborhood. The neighborhood determines the buying power of potential residents.

As explained, **neighborhood** and **renovation** are important factors for home flipping. The input variables for the home price model needed to relate to these factors.






In [11]:
import pandas as pd
import numpy as np
from scipy import stats
from sklearn.preprocessing import OneHotEncoder

## Exploratory Data Analysis

The data used in this model is from a 2014-2015 house sales in King County, Washington, USA dataset

In [12]:
# Read in Dataset

df = pd.read_csv('data/kc_house_data.csv')
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 21597 entries, 0 to 21596
Data columns (total 21 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   id             21597 non-null  int64  
 1   date           21597 non-null  object 
 2   price          21597 non-null  float64
 3   bedrooms       21597 non-null  int64  
 4   bathrooms      21597 non-null  float64
 5   sqft_living    21597 non-null  int64  
 6   sqft_lot       21597 non-null  int64  
 7   floors         21597 non-null  float64
 8   waterfront     19221 non-null  object 
 9   view           21534 non-null  object 
 10  condition      21597 non-null  object 
 11  grade          21597 non-null  object 
 12  sqft_above     21597 non-null  int64  
 13  sqft_basement  21597 non-null  object 
 14  yr_built       21597 non-null  int64  
 15  yr_renovated   17755 non-null  float64
 16  zipcode        21597 non-null  int64  
 17  lat            21597 non-null  float64
 18  long  

The above features were used to produce a 'model ready' dataset. The entire process can be seen in Data_Exploration.ipynb notebook stored in the Appendix folder. Some of the key changes made:

### Removing of Outliers
Only data within three standard deviations for all numerical features used in analysis were kept

In [13]:
df_cleaned = pd.read_csv('data/cleaned_data.csv')
print('Number of Homes Removed:',df.shape[0] - df_cleaned.shape[0],'| Percent of Homes Removed:', round((df.shape[0] - df_cleaned.shape[0])/df.shape[0]*100,2),'%')

Number of Homes Removed: 1067 | Percent of Homes Removed: 4.94 %


### Feature Engineering of Relative Living Area

To account for the importance of neighborhood to house flipping, a new feature called relative living area was created.

In [14]:
df['relative_living_area'] = df['sqft_living'] / df['sqft_living15']
df['relative_living_area'].describe()

count    21597.000000
mean         1.053144
std          0.320311
min          0.187279
25%          0.881188
50%          1.000000
75%          1.161039
max          6.000000
Name: relative_living_area, dtype: float64

```sqft_living```: The livable space in sqft of the home

```sqft_living15```: The average livable space in sqft of nearest 15 houses to the home

Taking the quotient gave a new feature which shows the relative amount of living space between a home and its neighbors.

![Relative Living Area](images/relative_living.png)

## Iterative Modeling

### Baseline Model

### Simple Model

### Final Model

## Conclusion

## Citations

https://info.kingcounty.gov/assessor/esales/Glossary.aspx?type=r

https://www.ramseysolutions.com/real-estate/how-to-flip-a-house

https://www.investopedia.com/articles/mortgages-real-estate/08/house-flip.asp
