# Project 2 - Ames Housing Data and Kaggle Challenge
## Revisited

When reviewing the various projects I completed during my time with General Assembly I found that Project 2, based on an existing Kaggle challenge, offered to most room for improvement. My notebooks were a mess: there was no clear organization to the repository as a whole and each individual notebook represented the full 'Data Science Process' (Cleaning through Model Deployment), each for a different list of features. 

While the project was not particularly complicated, the way in which I originally approached the project was: I now had a chance to improve upon the work I had done in a more succinct, cleaner manner. 

**Imports:**

In [1]:
import pandas as pd
pd.set_option("display.max_columns", None)

import numpy as np
np.random.seed(42)

import matplotlib.pyplot as plt
import seaborn as sns
sns.set(style = "darkgrid")

from sklearn import metrics 
from sklearn.metrics import mean_squared_error, r2_score

from sklearn.linear_model import LinearRegression, LassoCV, RidgeCV, HuberRegressor
from sklearn.tree import DecisionTreeRegressor

from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split, cross_val_score, cross_val_predict, GridSearchCV

**Reading in Model-Ready Data**

In [2]:
#reading in training and testing data
ames_df = pd.read_csv('../datasets/train_model_ready.csv')
ames_test_df = pd.read_csv('../datasets/test_model_ready.csv')

In [3]:
ames_df.head()

Unnamed: 0,ms_subclass,lot_frontage,lot_area,street,alley,land_contour,utilities,land_slope,condition_1,condition_2,overall_qual,overall_cond,year_built,year_remod/add,mas_vnr_area,exter_qual,exter_cond,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating,heating_qc,central_air,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,fireplaces,fireplace_qu,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,misc_val,mo_sold,yr_sold,saleprice,ms_zoning_C (all),ms_zoning_FV,ms_zoning_I (all),ms_zoning_RH,ms_zoning_RL,ms_zoning_RM,lot_shape_IR2,lot_shape_IR3,lot_shape_Reg,lot_config_CulDSac,lot_config_FR2,lot_config_FR3,lot_config_Inside,neighborhood_Blueste,neighborhood_BrDale,neighborhood_BrkSide,neighborhood_ClearCr,neighborhood_CollgCr,neighborhood_Crawfor,neighborhood_Edwards,neighborhood_Gilbert,neighborhood_Greens,neighborhood_GrnHill,neighborhood_IDOTRR,neighborhood_Landmrk,neighborhood_MeadowV,neighborhood_Mitchel,neighborhood_NAmes,neighborhood_NPkVill,neighborhood_NWAmes,neighborhood_NoRidge,neighborhood_NridgHt,neighborhood_OldTown,neighborhood_SWISU,neighborhood_Sawyer,neighborhood_SawyerW,neighborhood_Somerst,neighborhood_StoneBr,neighborhood_Timber,neighborhood_Veenker,bldg_type_2fmCon,bldg_type_Duplex,bldg_type_Twnhs,bldg_type_TwnhsE,house_style_1.5Unf,house_style_1Story,house_style_2.5Fin,house_style_2.5Unf,house_style_2Story,house_style_SFoyer,house_style_SLvl,roof_style_Gable,roof_style_Gambrel,roof_style_Hip,roof_style_Mansard,roof_style_Shed,roof_matl_CompShg,roof_matl_Membran,roof_matl_Tar&Grv,roof_matl_WdShake,roof_matl_WdShngl,exterior_1st_AsphShn,exterior_1st_BrkComm,exterior_1st_BrkFace,exterior_1st_CBlock,exterior_1st_CemntBd,exterior_1st_HdBoard,exterior_1st_ImStucc,exterior_1st_MetalSd,exterior_1st_Plywood,exterior_1st_Stone,exterior_1st_Stucco,exterior_1st_VinylSd,exterior_1st_Wd Sdng,exterior_1st_WdShing,exterior_2nd_AsphShn,exterior_2nd_Brk Cmn,exterior_2nd_BrkFace,exterior_2nd_CBlock,exterior_2nd_CmentBd,exterior_2nd_HdBoard,exterior_2nd_ImStucc,exterior_2nd_MetalSd,exterior_2nd_Plywood,exterior_2nd_Stone,exterior_2nd_Stucco,exterior_2nd_VinylSd,exterior_2nd_Wd Sdng,exterior_2nd_Wd Shng,mas_vnr_type_BrkFace,mas_vnr_type_None,mas_vnr_type_Stone,foundation_CBlock,foundation_PConc,foundation_Slab,foundation_Stone,foundation_Wood,electrical_FuseF,electrical_FuseP,electrical_Mix,electrical_SBrkr,functional_1,functional_2,functional_Sev,garage_type_Attchd,garage_type_Basment,garage_type_BuiltIn,garage_type_CarPort,garage_type_Detchd,garage_type_NoGrg,fence_2,fence_3,fence_4,fence_MnWw,misc_feature_Gar2,misc_feature_NoMisc,misc_feature_Othr,misc_feature_Shed,misc_feature_TenC,sale_type_CWD,sale_type_Con,sale_type_ConLD,sale_type_ConLI,sale_type_ConLw,sale_type_New,sale_type_Oth,sale_type_WD
0,60,68.0,13517,1,1,1,1,1,0,1,6,8,1976,2005,289.0,4,3,3,3,1,6,533.0,1,0.0,192.0,725.0,1,5,1,725,754,0,1479,0.0,0.0,2,1,3,1,4,6,0,0,1976.0,2,2.0,475.0,3,3,2,0,44,0,0,0,0,0,0,3,2010,130500,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
1,60,43.0,11492,1,1,1,1,1,1,1,7,5,1996,1997,132.0,4,3,4,3,1,6,637.0,1,0.0,276.0,913.0,1,5,1,913,1209,0,2122,1.0,0.0,2,1,4,1,4,8,1,3,1997.0,2,2.0,559.0,3,3,2,0,74,0,0,0,0,0,0,4,2009,220000,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
2,20,68.0,7922,1,1,1,1,1,1,1,5,7,1953,2007,0.0,3,4,3,3,1,6,731.0,1,0.0,326.0,1057.0,1,3,1,1057,0,0,1057,1.0,0.0,1,0,3,1,4,5,0,0,1953.0,1,1.0,246.0,3,3,2,0,52,0,0,0,0,0,0,1,2010,109000,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
3,60,73.0,9802,1,1,1,1,1,1,1,5,5,2006,2007,0.0,3,3,4,3,1,1,0.0,1,0.0,384.0,384.0,1,4,1,744,700,0,1444,0.0,0.0,2,1,3,1,3,7,0,0,2007.0,3,2.0,400.0,3,3,2,100,0,0,0,0,0,0,0,4,2010,174000,0,0,0,0,1,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
4,50,82.0,14235,1,1,1,1,1,1,1,6,8,1900,1993,0.0,3,3,2,4,1,1,0.0,1,0.0,676.0,676.0,1,3,1,831,614,0,1445,0.0,0.0,2,0,3,1,3,6,0,0,1957.0,1,2.0,484.0,3,3,0,0,59,0,0,0,0,0,0,3,2010,138500,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1


In [4]:
ames_test_df.head()

Unnamed: 0,ms_subclass,lot_frontage,lot_area,street,alley,land_contour,utilities,land_slope,condition_1,condition_2,overall_qual,overall_cond,year_built,year_remod/add,mas_vnr_area,exter_qual,exter_cond,bsmt_qual,bsmt_cond,bsmt_exposure,bsmtfin_type_1,bsmtfin_sf_1,bsmtfin_type_2,bsmtfin_sf_2,bsmt_unf_sf,total_bsmt_sf,heating_qc,central_air,1st_flr_sf,2nd_flr_sf,low_qual_fin_sf,gr_liv_area,bsmt_full_bath,bsmt_half_bath,full_bath,half_bath,bedroom_abvgr,kitchen_abvgr,kitchen_qual,totrms_abvgrd,functional,fireplaces,fireplace_qu,garage_yr_blt,garage_finish,garage_cars,garage_area,garage_qual,garage_cond,paved_drive,wood_deck_sf,open_porch_sf,enclosed_porch,3ssn_porch,screen_porch,pool_area,pool_qc,misc_val,mo_sold,yr_sold,ms_zoning_FV,ms_zoning_I (all),ms_zoning_RH,ms_zoning_RL,ms_zoning_RM,lot_shape_IR2,lot_shape_IR3,lot_shape_Reg,lot_config_CulDSac,lot_config_FR2,lot_config_FR3,lot_config_Inside,neighborhood_Blueste,neighborhood_BrDale,neighborhood_BrkSide,neighborhood_ClearCr,neighborhood_CollgCr,neighborhood_Crawfor,neighborhood_Edwards,neighborhood_Gilbert,neighborhood_Greens,neighborhood_IDOTRR,neighborhood_MeadowV,neighborhood_Mitchel,neighborhood_NAmes,neighborhood_NPkVill,neighborhood_NWAmes,neighborhood_NoRidge,neighborhood_NridgHt,neighborhood_OldTown,neighborhood_SWISU,neighborhood_Sawyer,neighborhood_SawyerW,neighborhood_Somerst,neighborhood_StoneBr,neighborhood_Timber,neighborhood_Veenker,bldg_type_2fmCon,bldg_type_Duplex,bldg_type_Twnhs,bldg_type_TwnhsE,house_style_1.5Unf,house_style_1Story,house_style_2.5Fin,house_style_2.5Unf,house_style_2Story,house_style_SFoyer,house_style_SLvl,roof_style_Gable,roof_style_Gambrel,roof_style_Hip,roof_style_Mansard,roof_style_Shed,roof_matl_Metal,roof_matl_Roll,roof_matl_Tar&Grv,roof_matl_WdShake,roof_matl_WdShngl,exterior_1st_AsphShn,exterior_1st_BrkComm,exterior_1st_BrkFace,exterior_1st_CemntBd,exterior_1st_HdBoard,exterior_1st_MetalSd,exterior_1st_Plywood,exterior_1st_PreCast,exterior_1st_Stucco,exterior_1st_VinylSd,exterior_1st_Wd Sdng,exterior_1st_WdShing,exterior_2nd_AsphShn,exterior_2nd_Brk Cmn,exterior_2nd_BrkFace,exterior_2nd_CBlock,exterior_2nd_CmentBd,exterior_2nd_HdBoard,exterior_2nd_ImStucc,exterior_2nd_MetalSd,exterior_2nd_Other,exterior_2nd_Plywood,exterior_2nd_PreCast,exterior_2nd_Stucco,exterior_2nd_VinylSd,exterior_2nd_Wd Sdng,exterior_2nd_Wd Shng,mas_vnr_type_BrkFace,mas_vnr_type_CBlock,mas_vnr_type_None,mas_vnr_type_Stone,foundation_CBlock,foundation_PConc,foundation_Slab,foundation_Stone,foundation_Wood,heating_1,heating_Floor,electrical_FuseF,electrical_FuseP,electrical_SBrkr,garage_type_Attchd,garage_type_Basment,garage_type_BuiltIn,garage_type_CarPort,garage_type_Detchd,garage_type_NoGrg,fence_2,fence_3,fence_4,fence_MnWw,misc_feature_NoMisc,misc_feature_Othr,misc_feature_Shed,sale_type_CWD,sale_type_Con,sale_type_ConLD,sale_type_ConLI,sale_type_ConLw,sale_type_New,sale_type_Oth,sale_type_VWD,sale_type_WD
0,190,69.0,9142,1,0,1,1,1,1,1,6,8,1910,1950,0.0,3,2,2,3,1,1,0,1,0,1020,1020,4,0,908,1020,0,1928,0,0,2,0,4,2,2,9,2,0,0,1910.0,1,1,440,1,1,2,0,60,112,0,0,0,0,0,4,2006,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
1,90,68.0,9662,1,1,1,1,1,1,1,5,4,1977,1977,0.0,3,3,4,3,1,1,0,1,0,1967,1967,3,1,1967,0,0,1967,0,0,2,0,6,2,3,10,2,0,0,1977.0,3,2,580,3,3,2,170,0,0,0,0,0,0,0,8,2006,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
2,60,58.0,17104,1,1,1,1,1,1,1,7,5,2006,2006,0.0,4,3,4,4,3,6,554,1,0,100,654,5,1,664,832,0,1496,1,0,2,1,3,1,4,7,2,1,4,2006.0,2,2,426,3,3,2,100,24,0,0,0,0,0,0,9,2006,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0
3,30,60.0,8520,1,1,1,1,1,1,1,5,6,1923,2006,0.0,4,3,3,3,1,1,0,1,0,968,968,3,1,968,0,0,968,0,0,1,0,2,1,3,5,2,0,0,1935.0,1,2,480,2,3,0,0,0,184,0,0,0,0,0,7,2007,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1
4,20,68.0,9500,1,1,1,1,1,1,1,6,5,1963,1963,247.0,3,3,4,3,1,4,609,1,0,785,1394,4,1,1394,0,0,1394,1,0,1,1,3,1,3,6,2,2,4,1963.0,2,2,514,3,3,2,0,76,0,0,185,0,0,0,7,2009,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,1,0,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1


In [5]:
ames_df.shape

(2051, 188)

In [6]:
ames_test_df.shape

(878, 181)

**Setting Features**

In [7]:
ames_corr = ames_df.corr()['saleprice'][:-1]
list_feats = ames_corr[(ames_corr > 0.25) | (ames_corr < -0.10)].sort_values(ascending = False).keys()
list_feats

Index(['saleprice', 'overall_qual', 'exter_qual', 'gr_liv_area',
       'kitchen_qual', 'garage_area', 'garage_cars', 'total_bsmt_sf',
       '1st_flr_sf', 'bsmt_qual', 'year_built', 'garage_finish',
       'year_remod/add', 'fireplace_qu', 'full_bath', 'foundation_PConc',
       'garage_yr_blt', 'totrms_abvgrd', 'mas_vnr_area', 'fireplaces',
       'heating_qc', 'neighborhood_NridgHt', 'bsmtfin_sf_1', 'bsmt_exposure',
       'sale_type_New', 'garage_type_Attchd', 'bsmtfin_type_1',
       'exterior_1st_VinylSd', 'exterior_2nd_VinylSd', 'open_porch_sf',
       'wood_deck_sf', 'lot_frontage', 'mas_vnr_type_Stone', 'lot_area',
       'paved_drive', 'garage_qual', 'bsmt_full_bath', 'half_bath',
       'central_air', 'roof_style_Hip', 'garage_cond', 'neighborhood_NoRidge',
       'mas_vnr_type_BrkFace', 'neighborhood_StoneBr', 'electrical_SBrkr',
       'fence_2', 'exterior_2nd_HdBoard', 'bldg_type_Duplex',
       'bldg_type_2fmCon', 'neighborhood_MeadowV', 'bldg_type_Twnhs',
       'exteri

**Train-Test Split**

In [8]:
#setting features and target:
X = ames_df[list_feats].drop(columns = 'saleprice')
y = ames_df[['saleprice']]

In [9]:
# Train/test split data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size = 0.20,
                                                    random_state = 42)

In [10]:
# Check train/test shape
X_train.shape, X_test.shape, y_train.shape, y_test.shape

((1640, 74), (411, 74), (1640, 1), (411, 1))

## Modeling

**Linear Regression**

In [11]:
lr_1 = LinearRegression()

lr_1.fit(X_train, y_train)
print(f'{lr_1.score(X_train, y_train)}, {lr_1.score(X_test, y_test)}')
print(abs(lr_1.score(X_train, y_train) - lr_1.score(X_test, y_test)))

0.8638593218194196, 0.8823403324365041
0.018481010617084515


In [12]:
cross_val_score(lr_1, X_train, y_train, cv = 5), cross_val_score(lr_1, X_test, y_test, cv = 5)

(array([0.8773554 , 0.83669231, 0.86851637, 0.83983927, 0.68888864]),
 array([0.82509076, 0.85482465, 0.88185277, 0.84462183, 0.85551288]))

In [13]:
cross_val_score(lr_1, X_train, y_train, cv = 5).mean(), cross_val_score(lr_1, X_test, y_test, cv = 5).mean()

(0.8222584006392598, 0.8523805758599539)

In [14]:
cross_val_score(lr_1, X, y, cv = 5)

array([0.8774819 , 0.88297072, 0.78768887, 0.87675435, 0.81784706])

In [15]:
cross_val_score(lr_1, X, y, cv = 5).mean()

0.8485485779784959

In [16]:
def model_eval(true_val, pred_val):
    rmse = mean_squared_error(true_val, pred_val, squared = False)
    r2 = r2_score(true_val, pred_val)
    
    return f'RMSE: {rmse}', f'R2 Score: {r2}'

In [17]:
lr_1_preds = lr_1.predict(X_test)

In [18]:
model_eval(y_test, lr_1_preds)

('RMSE: 26440.059338654333', 'R2 Score: 0.8823403324365041')

In [19]:
#sorting features by the absolutely size of their coefficient (without respect for positive or negative)
lr_features = pd.DataFrame(X_train.columns, columns = ['feature'])
lr_features['abscoef'] = np.abs(lr_1.coef_)[0]
lr_features.sort_values(by='abscoef', ascending = False).head(10)

Unnamed: 0,feature,abscoef
67,garage_type_NoGrg,60270.525444
42,neighborhood_StoneBr,52488.41251
40,neighborhood_NoRidge,37904.897215
20,neighborhood_NridgHt,37469.243918
62,neighborhood_Edwards,20448.138726
73,mas_vnr_type_None,20308.052478
31,mas_vnr_type_Stone,19145.351068
38,roof_style_Hip,19118.267183
23,sale_type_New,17139.173615
49,bldg_type_Twnhs,16674.14331


**Huber Regressor**

In [20]:
#https://machinelearningmastery.com/robust-regression-for-machine-learning-in-python/
huber = HuberRegressor(epsilon = 1.33,
                       max_iter = 50_000)

huber.fit(X_train, y_train['saleprice'])
huber.score(X_train, y_train['saleprice']), huber.score(X_test, y_test['saleprice'])

(0.8311211884968475, 0.8869700841695844)

**Scaling with Standard Scaler**

In [21]:
sc = StandardScaler()

X_train_sc = sc.fit_transform(X_train)
X_test_sc = sc.transform(X_test)

In [22]:
#LinReg with sc
lr_1.fit(X_train_sc, y_train)
lr_1.score(X_train_sc, y_train), lr_1.score(X_test_sc, y_test)

(0.8638593218194196, 0.8823403324364937)

In [23]:
cross_val_score(lr_1, X, y, cv = 5)

array([0.8774819 , 0.88297072, 0.78768887, 0.87675435, 0.81784706])

In [24]:
cross_val_score(lr_1, X, y, cv = 5).mean()

0.8485485779784959

In [25]:
#huber with sc
huber.fit(X_train_sc, y_train['saleprice'])
huber.score(X_train_sc, y_train['saleprice']), huber.score(X_test_sc, y_test['saleprice'])

(0.8450636384756319, 0.8931095639691742)

**LASSO**

In [26]:
y_train.shape

(1640, 1)

In [27]:
y_train['saleprice'].shape

(1640,)

In [28]:
l_alphas = np.logspace(-3, 3, 100)

lasso = LassoCV(alphas = l_alphas,
                cv = 5,
                max_iter = 50_000)

lasso.fit(X_train_sc, y_train['saleprice'])
lasso.score(X_train_sc, y_train['saleprice']), lasso.score(X_test_sc, y_test['saleprice'])

(0.8618223190357067, 0.8845382054616211)

In [29]:
lasso.alpha_

247.7076355991714

In [30]:
cross_val_score(lasso, X, y['saleprice'], cv = 5)

array([0.87716557, 0.88475214, 0.78266569, 0.87993822, 0.81756628])

In [31]:
cross_val_score(lasso, X, y['saleprice'], cv = 5).mean()

0.8484175795811195

**Ridge**

In [32]:
ridge = RidgeCV(alphas = l_alphas)

ridge.fit(X_train_sc, y_train['saleprice'])
ridge.score(X_train_sc, y_train['saleprice']), ridge.score(X_test_sc, y_test['saleprice'])

(0.8574105150936255, 0.8840070105758688)

In [33]:
ridge.alpha_

376.49358067924715

**DecisionTreeRegressor**

- With default values:

In [34]:
dtr = DecisionTreeRegressor()

dtr.fit(X_train, y_train)
dtr.score(X_train, y_train['saleprice']), dtr.score(X_test, y_test['saleprice'])

(1.0, 0.7998808025195299)

- With GridSearch:

In [35]:
grid = GridSearchCV(estimator = dtr,
                    param_grid = {'max_depth': [None, 2, 3, 5, 7],
                                  'min_samples_split': [2, 5, 10, 15, 20],
                                  'min_samples_leaf': [1, 2, 3, 4, 5, 6],
                                  'ccp_alpha': [0, 0.001, 0.01, 0.1, 1, 10]},
                    cv = 5,
                    verbose = 1)

In [36]:
import time
t0 = time.time()

grid.fit(X_train, y_train)

print(time.time() - t0)

Fitting 5 folds for each of 900 candidates, totalling 4500 fits
51.18533110618591


In [37]:
grid.score(X_train, y_train), grid.score(X_test, y_test)

(0.9176608776043584, 0.8199024159858579)

In [38]:
grid.best_params_

{'ccp_alpha': 10,
 'max_depth': None,
 'min_samples_leaf': 6,
 'min_samples_split': 20}

In [39]:
grid_preds = grid.predict(X_test)

In [40]:
model_eval(y_test, grid_preds)

('RMSE: 32711.65825554782', 'R2 Score: 0.8199024159858579')