# Model Tuning
---

### Objective
This notebook seeks to optimize the benchmark model by implimenting ridge, lasso, and elastic net regularizations. After creating new second degree features, scaling, and transforming the data, I compare R-squared scores to choose the most predictive model. Using my best model I will predict house prices in Ames, Iowa.

---
#### External Libraries Import

In [1]:
import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler , PolynomialFeatures
from sklearn.linear_model import LinearRegression, RidgeCV, LassoCV, ElasticNetCV
from sklearn.model_selection import cross_val_score , train_test_split , KFold
import warnings
warnings.filterwarnings('ignore')

#### Read Cleaned and Preprocessed Datasets

In [2]:
df_train = pd.read_csv('../datasets/preprocessed_train.csv')
df_test = pd.read_csv('../datasets/preprocessed_test.csv')

### Train-test Split

In [3]:
# get same interesting features list

interesting_features = ['neighborhood','overall_qual', 'year_built', 'year_remod/add', 'exterior_1st',
                        'mas_vnr_type', 'exter_qual', 'bsmt_qual', 'total_bsmt_sf', 'gr_liv_area',
                        'full_bath', 'kitchen_qual', 'fireplaces', 'garage_area']

In [4]:
X = df_train[interesting_features]
Xtest = df_test[interesting_features]
y = df_train['saleprice']

X_train, X_test, y_train, y_test = train_test_split(X , y , test_size = 0.3 , random_state = 77)

### Transform Data to Prepare for Modeling

In [5]:
# run poly features to create interaction terms with degree of 2
# make sure to transform on both train and test

poly = PolynomialFeatures(include_bias = False)
X_train = poly.fit_transform(X_train)
X_test = poly.fit_transform(X_test)
Xtest = poly.fit_transform(Xtest)

In [6]:
# scale each feature using standard scaler 

ss = StandardScaler()  
X_train_sc = ss.fit_transform(X_train)
X_test_sc = ss.transform(X_test)
Xtest_sc = ss.transform(Xtest)

### Compare Four Different Models

In [7]:
lr = LinearRegression()
lasso = LassoCV() 
ridge = RidgeCV()
en = ElasticNetCV(l1_ratio = [.1 , .5 , .7 , .9 , .95 , .99 , 1])

# include a shuffled KFold with 10 splits to emphasize the accuracy of the cross-validation scores
kf = KFold(n_splits = 10 , shuffle = True , random_state = 77)

lr_cv = cross_val_score(lr, X_train_sc, y_train, cv=kf).mean()
lasso_cv = cross_val_score(lasso, X_train_sc, y_train, cv=kf).mean()
ridge_cv = cross_val_score(ridge, X_train_sc, y_train, cv=kf).mean()
en_cv = cross_val_score(en, X_train_sc, y_train, cv=kf).mean()

print('Linear regression produces an average R-squared of {}.' .format(lr_cv))
print('Lasso regression produces an average R-squared of {}.' .format(lasso_cv))
print('Ridge regression produces an average R-squared of {}.' .format(ridge_cv))
print('Elastic net regression produces an average R-squared of {}.' .format(en_cv))

Linear regression produces an average R-squared of 0.8946040760380514.
Lasso regression produces an average R-squared of 0.9022888169099292.
Ridge regression produces an average R-squared of 0.9038493250636682.
Elastic net regression produces an average R-squared of 0.9022888169099292.


- Comparing each of the cross-validation scores allows me to identify the best model for predicting house prices. The ridge regression produces the highest score which implies that shrinking the coefficients for select features improves the overall explanatory power of the model. However, because the scores are so similar, using lasso eliminates coefficients and allows for a more interpretable model, therefore I will make predictions using lasso. 
<br><br>

In [8]:
# fit a lasso regression model on testing set

lasso_cv = cross_val_score(lasso, X_test_sc, y_test, cv=kf).mean()
print('Lasso regression with the testing data produces a mean R-squared score of {}.' .format(ridge_cv))

Lasso regression with the testing data produces a mean R-squared score of 0.9038493250636682.


The R-squared score for my testing set is nearly identical to the score for my training set. This means my model is neither overfit nor overfit.
<br><br>
In conclusion, performing a lasso regression on my interesting features is my best model. It produces the best balance between high explanatory power of sales price and most accessible interpretation.

#### Create predictions for the test dataset for Kaggle

In [9]:
ridge.fit(X_train_sc , y_train)

y_hat = ridge.predict(Xtest_sc)

In [10]:
predictions = pd.DataFrame([] , columns = ['Id' , 'SalePrice'])
predictions['Id'] = df_test['id']
predictions['SalePrice'] = y_hat
predictions.head()

Unnamed: 0,Id,SalePrice
0,2658,117423.253573
1,2718,178049.606242
2,2414,182514.148095
3,1989,114576.84139
4,625,186818.688282


In [11]:
# save predictions.csv for kaggle

predictions.to_csv('../datasets/predictions.csv' , index = False)