# **Real Estate Regression**
## **Predicting Residential Real Estate Sale Price in Polk County Based On Selected Features**
### *Data Programming with Python - Group 13 Project*
### *Eric Schelin, Navin Mukraj, Sean O'Bryan*
#### Notebook for model


<hr size=8>

<img align="right" src="https://www.python.org/static/community_logos/python-logo.png">
<img align="left" src="https://www.dsm.city/_assets_/images/logo.png">

## Table of contents  
***
1. Import the prepared data
    - Read the prepared data from a csv into a dataframe
2. Build the regression model
    - Extract selected features and target to X and Y.  Split the data into test and train data.
    - Fit the model
    - Output accuracy scores of regression
    - Export model to pkl file

### Import the prepared data

#### Read the prepared data from a csv into a dataframe

In [7]:
import pandas as pd
uri = "./data/prepared-data.csv" 
df = pd.read_csv(uri)


### Build the regression model

#### Extract selected features and target to X and Y.  Split the data into test and train data.

In [11]:
from sklearn.model_selection import train_test_split
features = ['bathrooms','year_built','total_living_area']
X = df[features].copy()
y = df[['price_adj']]

X_train, X_test, y_train, y_test =train_test_split(X,y,test_size=0.8,random_state=1)

#### Fit the model

In [12]:
from sklearn.linear_model import LinearRegression
# this did not always converge on the first attempt
lr=LinearRegression().fit(X_train, y_train)


#### Output accuracy scores of regression

In [14]:
from sklearn import metrics
import numpy as np

lrprd=lr.predict(X_test)

mean_squared_error=metrics.mean_squared_error(y_test,lrprd)
print('Mean Squared Error (MSE) ', round(np.sqrt(mean_squared_error), 2))
print('R-squared (training) ', round(lr.score(X_train, y_train), 3))
print('R-squared (testing) ', round(lr.score(X_test, y_test), 3))
print('Intercept: ', lr.intercept_)
print('Coefficient: ', lr.coef_)

Mean Squared Error (MSE)  80821.64
R-squared (training)  0.537
R-squared (testing)  0.588
Intercept:  [-1935090.68646401]
Coefficient:  [[25634.04700098   975.57887562   110.70426275]]


#### Export model to pkl file

In [15]:
import joblib

with open("data/regressionModel.pkl", "wb") as fwb:
    joblib.dump(lr, fwb)