# Phase 2 Project: Non-Technical Presentation of Price Predictor

In [3]:
%run code/import_libs.py
#%run code/functions.py
%run code/functions_v1.1.py
%run code/Build_Forms_v1.1.py
%run code/initial_data_prep.py

%matplotlib inline

initial_pred = df.drop(columns=["price"]).copy()
initial_price = df[["price"]]

mean_price_2014_2015=initial_price.mean()[0]


pred_fin, price_fin = transform_data(initial_pred, initial_price)

# Create OLS linear model
pred_int = sm.add_constant(pred_fin)
model = sm.OLS(price_fin,pred_int).fit()
model.rsquared_adj


coef_df=model.params.reset_index()
coef_df.columns=["Column","Value"]


def calculate_price (sqft_living, yr_built, zipcode, grade, waterfront, view , sqft_lot,  mean_price, coef_df=coef_df):
    if view == '-':
        view='NONE'

    if waterfront:
        waterfront='WATERFRONT'
    else:
        waterfront='NONE'

    b0,b1,b2,b3,b4,b5,b6,b7 = get_coeff( yr_built,zipcode,grade, waterfront, view, coef_df)
    y=round( np.exp(b0 + b1*np.log(sqft_living) + b2 + b3 + b4 + b5 + b6 + b7*np.log(sqft_lot)) )
    
    y=y*(mean_price/mean_price_2014_2015)
    print('{:,.0f}'.format(y))
    return y,b0,b1,b2,b3,b4,b5,b6,b7


title=form_items[0].children[0]
meanW=form_items[1].children[0]
zipW=form_items[2].children[1]
yearW=form_items[2].children[2]
gradeW=form_items[3].children[1]
livingW=form_items[4].children[1]
lotW=form_items[5].children[1]
viewW=form_items[6].children[1]
waterW=form_items[6].children[2]

output = widgets.interactive_output(calculate_price, {'mean_price' : meanW,
    'zipcode': zipW, 'yr_built':yearW, 'grade':gradeW,
    'sqft_living':livingW,'sqft_lot':lotW,'view':viewW,'waterfront':waterW } )

      
ui = widgets.VBox([form, output])

output.layout={'border': '3px solid green', 'width':'150px'}

#predictor=display(ui)




<img src="images/Home.jpg" alt="drawing" align="center"  width="450"/>  

#  Price Estimation and Analysis <br> for King County Houses

**Authors:** Dmitriy Fisch


## Overview

&emsp;&emsp; Our objective is to identify key factors which affect house pricing <br> &emsp;&emsp;   in King County
and use those factors in house price prediction. <br>



## How can we estimate a fair price in a growing <br> Real Estate market?

Per [Seattle Met Staff](https://www.seattlemet.com/home-and-real-estate/2022/01/how-expensive-is-a-house-in-seattle-bellevue-redmond-washington) article (Jan 2022):
- #### In 2012 an average price of a house in King County was \$424,000
- #### By 2020, prices rose significantly—to \$880,000! 
- #### In 2021 it is \$1,055,632. 
<img src="images/seatle_met.png" alt="drawing" align="right"  width="300" />


**Knowing the average is not enough!**

  <img src="images/calculate.jpg" alt="drawing" align="right"  width="300" height="300"/>
 
 #  The Main Three Price Factors <br>
 ***
  * Location (zipcode)
  <br>
  * Quality of materials, construction and design (grade)
  <br>
  * Square Footage (not counting a basement)  
  

<img src="images/location-location-location-1.png" alt="drawing" align="right"  width="300"/>

### Location is important!
&nbsp;&nbsp;&nbsp;&nbsp;Based on a zipcode average sq. ft. price can increase 300%!

 <img src="images/by_zipcode.png" alt="drawing" align="left"  width="1200"/>

<img src="images/Interior.jpg" alt="drawing" align="right"  width="300"/> 

#### Quality is important! 
Price per square foot doubles when comparing 
   lowest and highest construction grades  </li> 

 </font>  </body> 
   <img src="images/by_grade.png" alt="drawing" align="left"  width="1000"/>


   
  <img src="images/size_matters.jpg" alt="drawing" align="right"  width="300"/> 
<body> &emsp; <font size="5"> <h2> Size does matter! </h2> <br>
   <li>   Larger houses as expected are sold  for more money. </li>  </font>  </body> 
   <img src="images/by_size.png" alt="drawing" align="left"  width="1200"/>
   
   

  <img src="images/nice_view.jpg" alt="drawing" align="right"  width="400"/> 
  
  # Other Important Features <br>

  * #### View
  * #### Frontage along the water
  <br> 
  * more...
  

# Too many factors?
***
Challenges:
*  How multiple features above work togeter?
*  Quantifying joined features effect
*  Building a predictive model
* Building a front end for a customer




# Solution

*  Analyzing 2014-2015 dataset with past sales
*  Identifying individual and joined factors.
*  Prepairing features for the model
*  Using the model to calculate all the features coefficients 
*  Testing the results



# Data
***
#### King County house sales dataset contains:
*  details for 22,000 sold houses
*  final sales prices 

All the data is from 2014-2015 




# Features Identified
#### Main Features: 
* House Sq footage 
* Grade of design and materials quality
* Zipcode
* Waterfront
* View



## Additional Features: 
* Lot size
* basement

#### Only marginal effect from:
* Number of bedrooms, bathrooms, and floors

# Data Modeling
#### An iterative approach to data modeling 
-  Calculated Efficiency for basic features  
-  Transformed basic features to be ready for the modeling
-  Trained a model using subset of data
-  Chose the most efficient model based on transformed features
-  Tested the chosen model against different subsets of data


#### Building a Front End Tool:

In [6]:
display(ui)

VBox(children=(Box(box_style='success', children=(Box(children=(Text(value='Predicting House Sale Prices for K…

# Testing
***

We made sure the tool works as expected:
* Multiple comparissons of predicted data against the actual data
* Predicted price is within 90-110% of actual price (houses newer than 1980)
* Predicted price is within 85-115% of actual price (houses older than 1980)
***

# Conclusions
***

#### Considerations and Limitations:


* The tool can be effective to estimate base price for known features
* In the future a model should be re-trained with more up-to-date data
* The presented prototype will be greatly improved by more advanced modeling
