# Model Insights

---
### Objective
This notebook seeks to finalize the model process and pull out insights to make conclusive statements. In notebook 04 I decided that the lasso model would give me the most predictive and meaningful model. The coefficients produced by this model provide interpretation into the specific effect of each feature on the sale price of homes in Ames, Iowa. 

---
#### External Libraries Import

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler , PolynomialFeatures
from sklearn.linear_model import LassoCV
from sklearn.metrics import r2_score
import warnings
warnings.filterwarnings('ignore')

#### Read Cleaned and Preprocessed Datasets

In [2]:
df_train = pd.read_csv('../datasets/preprocessed_train.csv')

#### Interesting Features List

In [3]:
interesting_features = ['neighborhood','overall_qual', 'year_built', 'year_remod/add', 'exterior_1st',
                        'mas_vnr_type', 'exter_qual', 'bsmt_qual', 'total_bsmt_sf', 'gr_liv_area',
                        'full_bath', 'kitchen_qual', 'fireplaces', 'garage_area']

## Run Final Model

#### Transform and Scale Data

In [4]:
X = df_train[interesting_features]
y = df_train['saleprice']

poly = PolynomialFeatures(include_bias = False)
X_poly = poly.fit_transform(X)
X_poly = pd.DataFrame(X_poly , columns = poly.get_feature_names(interesting_features))
interesting_features = X_poly.columns

ss = StandardScaler()
X_poly_sc = ss.fit_transform(X_poly)

#### Fit Model

In [9]:
lasso = LassoCV()
lasso.fit(X_poly_sc , y)
y_hat = lasso.predict(X_poly_sc)
R2 = r2_score(y_hat , y)
print('Final model produces an R-squared value of {}.' .format(round((R2),4)))

Final model produces an R-squared value of 0.9018.


### Produce Dataframe of  Strong Coefficients

In [11]:
# create a coefficient dataframe 
final_model = pd.DataFrame(lasso.coef_, columns=['coefs'])
final_model['abs_coefs'] = abs(lasso.coef_)
final_model.index = X_poly.columns
final_model = final_model.sort_values('abs_coefs', ascending=False)
mask = final_model['abs_coefs'] > 0
final_model = final_model[mask]
final_model.head(15)

Unnamed: 0,coefs,abs_coefs
gr_liv_area kitchen_qual,17841.609601,17841.609601
overall_qual gr_liv_area,17392.086548,17392.086548
year_built exter_qual,-13082.828794,13082.828794
exter_qual total_bsmt_sf,10534.075301,10534.075301
bsmt_qual total_bsmt_sf,9781.930129,9781.930129
exter_qual gr_liv_area,8940.826663,8940.826663
year_built kitchen_qual,-8208.14417,8208.14417
full_bath^2,8107.698553,8107.698553
kitchen_qual garage_area,7860.539926,7860.539926
overall_qual total_bsmt_sf,7627.711348,7627.711348


- The dataframe above displays the sign and magnitude of the top 15 coefficients produced by the model. The features containing two features represent the interaction terms created by sklearn's PolynomialFeatures.

# Insights

Key Metrics:
   - R-sqaured
       - The final model produces an R-squared value of 90.18%. This means that the predictions calculated by the lasso regression model account for just over 90% of variation in the true sale prices of homes in Ames, Iowa. The score specifically implies that the features included in this model predict the true sale prices 90% better than the mean's ability to predict sale price.
<br><br>
   - Coefficients 
       - Each coefficient can be interpreted based on its sign and magnitude. A negative value indicates that the has a negative relationship with the sale price of house. The strongest predictors, based on size of coefficient, turned out to be the second degree features. Because the features are scaled, their specific effect on sale price is based on increases or decreases of standard dev. That is, for example, the strongest predictor (gr_liv_area kitchen_qual) for every 1 standard deviation increase of that feature, the model predicts the price to increase by \$17,841.61.
       <br><br>
       - Strongest positive predictors:
           - gr_liv_area * kitchen_qual
           - overall_qual * gr_liv_area
           - exter_qual * total_bsmt_sf
           <br><br>
       - Strongest negative predictors:
           - year_built * exter_qual
           - year_built * kitchen_qual
           - total_bsmt_sf * full_bath

# Conclusion

This model is conclusive in that it provides statistically significant evidence of predictive power regarding house prices at sale in Ames, Iowa. Therefore, with new data, this model can predict the price at sale of a house in Ames, Iowa with 90% accuracy. Although this model performs relatively well, the results are restricted by a limited dataset focused on only one city between the years 2006 and 2010. An interesting study to extend this experiment would be to collect data on the exact same list of features for a different city in the United States and apply this model. Comparing the R-squared value and coefficients could lead to profound insight into house prices at sale. Collecting more recent data on homes in Ames would also provide insight into how the effect on sale prices have changed over time.
<br><br>
The features that appeared most frequently as strong predictors were above ground square feet, the overall quality, the exterior quality, the square footage of the basement, the kitchen quality, the number of full bathrooms, and the year the house was built. 
<br><br>For homeowners or real estate investors looking to maximize the selling price of a home: 
- increase the square footage of your home (basement, garage)
- improve the quality of the outside of your house
- improve the quality of your kitchen
- add a full bathroom