# Advance Regression Assignment
Problem Statement
A US-based housing company named Surprise Housing has decided to enter the Australian market. The company uses data analytics to purchase houses at a price below their actual values and flip them on at a higher price. For the same purpose, the company has collected a data set from the sale of houses in Australia

# Goals of the Case Study
You are required to model the price of houses with the available independent variables. This model will then be used by the management to understand how exactly the prices vary with the variables.
They can accordingly manipulate the strategy of the firm and concentrate on areas that will yield high returns. Further, the model will be a good way for management to understand the pricing dynamics of a new market.

In [None]:
# importing warning package to ignore the warnings
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter(action='ignore', category=FutureWarning)
# Importing the required library

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.preprocessing import StandardScaler,MinMaxScaler
from sklearn.model_selection import train_test_split
from sklearn.feature_selection import RFE
from sklearn.model_selection import GridSearchCV
from sklearn.linear_model import LinearRegression,Ridge,Lasso
from sklearn.metrics import r2_score,mean_squared_error
from sklearn import metrics
Importing and Understanding Data
# Reading and inspecting the dataframe

h_data = pd.read_csv(r"train.csv")
h_data.head()

h_data.describe(include='all')

#inspecting the dataframe
# checking the number of rows and columns
h_data.shape
(1460, 81)
h_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 81 columns):
Id               1460 non-null int64
MSSubClass       1460 non-null int64
MSZoning         1460 non-null object
LotFrontage      1201 non-null float64
LotArea          1460 non-null int64
Street           1460 non-null object
Alley            91 non-null object
LotShape         1460 non-null object
LandContour      1460 non-null object
Utilities        1460 non-null object
LotConfig        1460 non-null object
LandSlope        1460 non-null object
Neighborhood     1460 non-null object
Condition1       1460 non-null object
Condition2       1460 non-null object
BldgType         1460 non-null object
HouseStyle       1460 non-null object
OverallQual      1460 non-null int64
OverallCond      1460 non-null int64
YearBuilt        1460 non-null int64
YearRemodAdd     1460 non-null int64
RoofStyle        1460 non-null object
RoofMatl         1460 non-null object
Exterior1st      1460 non-null object
Exterior2nd      1460 non-null object
MasVnrType       1452 non-null object
MasVnrArea       1452 non-null float64
ExterQual        1460 non-null object
ExterCond        1460 non-null object
Foundation       1460 non-null object
BsmtQual         1423 non-null object
BsmtCond         1423 non-null object
BsmtExposure     1422 non-null object
BsmtFinType1     1423 non-null object
BsmtFinSF1       1460 non-null int64
BsmtFinType2     1422 non-null object
BsmtFinSF2       1460 non-null int64
BsmtUnfSF        1460 non-null int64
TotalBsmtSF      1460 non-null int64
Heating          1460 non-null object
HeatingQC        1460 non-null object
CentralAir       1460 non-null object
Electrical       1459 non-null object
1stFlrSF         1460 non-null int64
2ndFlrSF         1460 non-null int64
LowQualFinSF     1460 non-null int64
GrLivArea        1460 non-null int64
BsmtFullBath     1460 non-null int64
BsmtHalfBath     1460 non-null int64
FullBath         1460 non-null int64
HalfBath         1460 non-null int64
BedroomAbvGr     1460 non-null int64
KitchenAbvGr     1460 non-null int64
KitchenQual      1460 non-null object
TotRmsAbvGrd     1460 non-null int64
Functional       1460 non-null object
Fireplaces       1460 non-null int64
FireplaceQu      770 non-null object
GarageType       1379 non-null object
GarageYrBlt      1379 non-null float64
GarageFinish     1379 non-null object
GarageCars       1460 non-null int64
GarageArea       1460 non-null int64
GarageQual       1379 non-null object
GarageCond       1379 non-null object
PavedDrive       1460 non-null object
WoodDeckSF       1460 non-null int64
OpenPorchSF      1460 non-null int64
EnclosedPorch    1460 non-null int64
3SsnPorch        1460 non-null int64
ScreenPorch      1460 non-null int64
PoolArea         1460 non-null int64
PoolQC           7 non-null object
Fence            281 non-null object
MiscFeature      54 non-null object
MiscVal          1460 non-null int64
MoSold           1460 non-null int64
YrSold           1460 non-null int64
SaleType         1460 non-null object
SaleCondition    1460 non-null object
SalePrice        1460 non-null int64
dtypes: float64(3), int64(35), object(43)
memory usage: 924.0+ KB
#Checking the Null values

h_data.isnull().sum()
Id                  0
MSSubClass          0
MSZoning            0
LotFrontage       259
LotArea             0
Street              0
Alley            1369
LotShape            0
LandContour         0
Utilities           0
LotConfig           0
LandSlope           0
Neighborhood        0
Condition1          0
Condition2          0
BldgType            0
HouseStyle          0
OverallQual         0
OverallCond         0
YearBuilt           0
YearRemodAdd        0
RoofStyle           0
RoofMatl            0
Exterior1st         0
Exterior2nd         0
MasVnrType          8
MasVnrArea          8
ExterQual           0
ExterCond           0
Foundation          0
                 ... 
BedroomAbvGr        0
KitchenAbvGr        0
KitchenQual         0
TotRmsAbvGrd        0
Functional          0
Fireplaces          0
FireplaceQu       690
GarageType         81
GarageYrBlt        81
GarageFinish       81
GarageCars          0
GarageArea          0
GarageQual         81
GarageCond         81
PavedDrive          0
WoodDeckSF          0
OpenPorchSF         0
EnclosedPorch       0
3SsnPorch           0
ScreenPorch         0
PoolArea            0
PoolQC           1453
Fence            1179
MiscFeature      1406
MiscVal             0
MoSold              0
YrSold              0
SaleType            0
SaleCondition       0
SalePrice           0
Length: 81, dtype: int64
# Checking if there are columns with one unique value since it won't affect our analysis
h_data.nunique()
Id               1460
MSSubClass         15
MSZoning            5
LotFrontage       110
LotArea          1073
Street              2
Alley               2
LotShape            4
LandContour         4
Utilities           2
LotConfig           5
LandSlope           3
Neighborhood       25
Condition1          9
Condition2          8
BldgType            5
HouseStyle          8
OverallQual        10
OverallCond         9
YearBuilt         112
YearRemodAdd       61
RoofStyle           6
RoofMatl            8
Exterior1st        15
Exterior2nd        16
MasVnrType          4
MasVnrArea        327
ExterQual           4
ExterCond           5
Foundation          6
                 ... 
BedroomAbvGr        8
KitchenAbvGr        4
KitchenQual         4
TotRmsAbvGrd       12
Functional          7
Fireplaces          4
FireplaceQu         5
GarageType          6
GarageYrBlt        97
GarageFinish        3
GarageCars          5
GarageArea        441
GarageQual          5
GarageCond          5
PavedDrive          3
WoodDeckSF        274
OpenPorchSF       202
EnclosedPorch     120
3SsnPorch          20
ScreenPorch        76
PoolArea            8
PoolQC              3
Fence               4
MiscFeature         4
MiscVal            21
MoSold             12
YrSold              5
SaleType            9
SaleCondition       6
SalePrice         663
Length: 81, dtype: int64
#Checking the value count

h_data.PoolQC.value_counts()
Gd    3
Ex    2
Fa    2
Name: PoolQC, dtype: int64
h_data.Alley.value_counts()
Grvl    50
Pave    41
Name: Alley, dtype: int64
h_data.Street.value_counts()
Pave    1454
Grvl       6
Name: Street, dtype: int64
h_data.Utilities.value_counts()
AllPub    1459
NoSeWa       1
Name: Utilities, dtype: int64
Data Preparation (Encoding Categorical Variables, Handling Null Values)
Imputing Null Values
## Checking the percentage of Null values

df_missing=pd.DataFrame((round(100*(h_data.isnull().sum()/len(h_data.index)), 2)), columns=['missing'])
df_missing.sort_values(by=['missing'], ascending=False).head(20)

## Treating the NaN Values
h_data['PoolQC'] = h_data['PoolQC'].fillna('No_Pool')
h_data['MiscFeature'] = h_data['MiscFeature'].fillna('None')
h_data['Alley'] = h_data['Alley'].fillna('No_Alley_Access')
h_data['Fence'] = h_data['Fence'].fillna('No_Fence')
h_data['FireplaceQu'] = h_data['FireplaceQu'].fillna('No_Fireplace')
h_data['GarageYrBlt'] = h_data['GarageYrBlt'].fillna(0)
h_data['MasVnrType'] = h_data['MasVnrType'].fillna('None')
h_data['MasVnrArea'] = h_data['MasVnrArea'].fillna(0)
h_data['MasVnrArea'] = h_data['MasVnrArea'].fillna(0)
h_data['Electrical'] = h_data['Electrical'].fillna("Other")
## Dropping the LotFontgage columns as it have more Null values
h_data.drop("LotFrontage",axis = 1, inplace=True)
# Imputing the Nan Values with 'No Basementh_data'
for col in ('BsmtFinType1', 'BsmtFinType2', 'BsmtExposure', 'BsmtQual','BsmtCond'):
    h_data[col] = h_data[col].fillna('No_Basement')
#Imputing the NaN values with 'no garage' 
for col in ('GarageType', 'GarageFinish', 'GarageQual', 'GarageCond'):
    h_data[col] = h_data[col].fillna('No_Garage')
h_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1460 entries, 0 to 1459
Data columns (total 80 columns):
Id               1460 non-null int64
MSSubClass       1460 non-null int64
MSZoning         1460 non-null object
LotArea          1460 non-null int64
Street           1460 non-null int64
Alley            1460 non-null object
LotShape         1460 non-null object
LandContour      1460 non-null object
Utilities        1460 non-null object
LotConfig        1460 non-null object
LandSlope        1460 non-null object
Neighborhood     1460 non-null object
Condition1       1460 non-null object
Condition2       1460 non-null object
BldgType         1460 non-null object
HouseStyle       1460 non-null object
OverallQual      1460 non-null int64
OverallCond      1460 non-null int64
YearBuilt        1460 non-null int64
YearRemodAdd     1460 non-null int64
RoofStyle        1460 non-null object
RoofMatl         1460 non-null object
Exterior1st      1460 non-null object
Exterior2nd      1460 non-null object
MasVnrType       1460 non-null object
MasVnrArea       1460 non-null float64
ExterQual        1460 non-null object
ExterCond        1460 non-null object
Foundation       1460 non-null object
BsmtQual         1460 non-null object
BsmtCond         1460 non-null object
BsmtExposure     1460 non-null object
BsmtFinType1     1460 non-null object
BsmtFinSF1       1460 non-null int64
BsmtFinType2     1460 non-null object
BsmtFinSF2       1460 non-null int64
BsmtUnfSF        1460 non-null int64
TotalBsmtSF      1460 non-null int64
Heating          1460 non-null object
HeatingQC        1460 non-null object
CentralAir       1460 non-null int64
Electrical       1460 non-null object
1stFlrSF         1460 non-null int64
2ndFlrSF         1460 non-null int64
LowQualFinSF     1460 non-null int64
GrLivArea        1460 non-null int64
BsmtFullBath     1460 non-null int64
BsmtHalfBath     1460 non-null int64
FullBath         1460 non-null int64
HalfBath         1460 non-null int64
BedroomAbvGr     1460 non-null int64
KitchenAbvGr     1460 non-null int64
KitchenQual      1460 non-null object
TotRmsAbvGrd     1460 non-null int64
Functional       1460 non-null object
Fireplaces       1460 non-null int64
FireplaceQu      1460 non-null object
GarageType       1460 non-null object
GarageYrBlt      1460 non-null int32
GarageFinish     1460 non-null object
GarageCars       1460 non-null int64
GarageArea       1460 non-null int64
GarageQual       1460 non-null object
GarageCond       1460 non-null object
PavedDrive       1460 non-null int64
WoodDeckSF       1460 non-null int64
OpenPorchSF      1460 non-null int64
EnclosedPorch    1460 non-null int64
3SsnPorch        1460 non-null int64
ScreenPorch      1460 non-null int64
PoolArea         1460 non-null int64
PoolQC           1460 non-null object
Fence            1460 non-null object
MiscFeature      1460 non-null object
MiscVal          1460 non-null int64
MoSold           1460 non-null int64
YrSold           1460 non-null int64
SaleType         1460 non-null object
SaleCondition    1460 non-null object
SalePrice        1460 non-null int64
dtypes: float64(1), int32(1), int64(38), object(40)
memory usage: 906.9+ KB
h_data['GarageYrBlt'] = h_data['GarageYrBlt'].astype(int)
Performing EDA
Univariate and Bivariate Analysis
plt.scatter(h_data.MasVnrArea,h_data.SalePrice)
<matplotlib.collections.PathCollection at 0x1301a5f8>
Notebook Image
# plotting a distplot 
plt.figure(figsize = (10,5))
sns.distplot(h_data['MasVnrArea']).tick_params(axis='x', rotation = 90)
plt.title('Veneer Area')
Text(0.5,1,'Veneer Area')
Notebook Image
sns.distplot(h_data['SalePrice'])
<matplotlib.axes._subplots.AxesSubplot at 0x15026cf8>
Notebook Image
print("Skewness: %f" % h_data['SalePrice'].skew())
print("Kurtosis: %f" % h_data['SalePrice'].kurt())
Skewness: 1.882876
Kurtosis: 6.536282
## Checking Basement counts
sns.countplot(x='BsmtCond', data= h_data)
plt.title('Basement Condition')
Text(0.5,1,'Basement Condition')
Notebook Image
sns.countplot(x='OverallCond', data= h_data).tick_params(axis='x', rotation = 90)
plt.title('Overall Condition')
Text(0.5,1,'Overall Condition')
Notebook Image
           5 is most overall condition
data = pd.concat([h_data['SalePrice'], h_data['GrLivArea']], axis=1)
data.plot.scatter(x='GrLivArea', y='SalePrice', ylim=(0,800000));
plt.title('Gr LivArea vs SalePrice')
Text(0.5,1,'Gr LivArea vs SalePrice')
Notebook Image
# Checking the outliers 

sns.boxplot(x='SalePrice', data=h_data)
<matplotlib.axes._subplots.AxesSubplot at 0x16819518>
Notebook Image
sns.boxplot(x='OverallQual', y='SalePrice', data=h_data)
plt.title("Overall Quality vs SalePrice")
Text(0.5,1,'Overall Quality vs SalePrice')
Notebook Image
#scatterplot
sns.set()
cols = ['SalePrice', 'OverallQual', 'GrLivArea', 'GarageCars', 'TotalBsmtSF', 'FullBath', 'YearBuilt']
sns.pairplot(h_data[cols], size = 2.5)
plt.show();
Notebook Image
#checjing the correlation matrix
corrmat = h_data.corr()
f, ax = plt.subplots(figsize=(12, 9))
sns.heatmap(corrmat, vmax=.8, square=True);
plt.title("Checking Correlation matrix ")
Text(0.5,1,'Checking Correlation matrix ')
Notebook Image
Data Preperation
plt.figure(figsize=(11,6))
sns.distplot(np.log(h_data["SalePrice"]))
<matplotlib.axes._subplots.AxesSubplot at 0x1d4fc0f0>
Notebook Image
Deriving Variables
numeric_data = h_data.select_dtypes(include = ['float64','int64'])
numeric_data.columns
Index(['Id', 'MSSubClass', 'LotArea', 'Street', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2',
       'BsmtUnfSF', 'TotalBsmtSF', 'CentralAir', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageCars', 'GarageArea', 'PavedDrive', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
       'MiscVal', 'MoSold', 'YrSold', 'SalePrice'],
      dtype='object')
#Converting a Binary varible into numeric datatypes
# mapping it to 0 and 1
h_data['Street'] = h_data['Street'].apply(lambda x: 1 if x == 'Pave' else 0 )

h_data['CentralAir'] = h_data['CentralAir'].apply(lambda x : 1 if x == 'Y' else 0)
                                                  
h_data['PavedDrive'] = h_data['PavedDrive'].apply(lambda x : 1 if x == 'Y' else 0)                                                  
cat_values = h_data.select_dtypes(include=['object'])
cat_values.head()

# convert into dummies
data_dummies = pd.get_dummies(cat_values, drop_first=True)
data_dummies.head()

## Droping the 'Id' column 
df = h_data.drop(['Id'],axis=1)
# Droping the original categorical column
df = df.drop(list(cat_values.columns), axis=1)
# Adding the dummy categorical column to original dataset
df = pd.concat([df,data_dummies], axis=1)
df.shape
(1460, 259)
Train Test Split
df_train,df_test = train_test_split(df, train_size=0.7,test_size = 0.3, random_state=100)
y_train = np.log(df_train.SalePrice)
X_train = df_train.drop("SalePrice",1)

y_test= np.log(df_test.SalePrice)
X_test = df_test.drop("SalePrice",1)
num_values=X_train.select_dtypes(include=['int64','float64']).columns
num_values
Index(['MSSubClass', 'LotArea', 'Street', 'OverallQual', 'OverallCond',
       'YearBuilt', 'YearRemodAdd', 'MasVnrArea', 'BsmtFinSF1', 'BsmtFinSF2',
       'BsmtUnfSF', 'TotalBsmtSF', 'CentralAir', '1stFlrSF', '2ndFlrSF',
       'LowQualFinSF', 'GrLivArea', 'BsmtFullBath', 'BsmtHalfBath', 'FullBath',
       'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr', 'TotRmsAbvGrd',
       'Fireplaces', 'GarageCars', 'GarageArea', 'PavedDrive', 'WoodDeckSF',
       'OpenPorchSF', 'EnclosedPorch', '3SsnPorch', 'ScreenPorch', 'PoolArea',
       'MiscVal', 'MoSold', 'YrSold'],
      dtype='object')
##   Scaling the data
scaler = StandardScaler()
X_train[num_values] = scaler.fit_transform(X_train[num_values])
X_test[num_values] = scaler.transform(X_test[num_values])
Model Building
## Building a Regression model.
reg = LinearRegression()
reg.fit(X_train,y_train)
LinearRegression()
# Calculating the RFE
rfe = RFE(reg, 20)
rfe = rfe.fit(X_train, y_train)
col=X_train.columns[rfe.support_]
col
Index(['BsmtFinSF1', 'BsmtFinSF2', 'BsmtUnfSF', 'TotalBsmtSF', '1stFlrSF',
       '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'RoofMatl_Membran',
       'RoofMatl_Metal', 'GarageType_No_Garage', 'GarageFinish_No_Garage',
       'GarageQual_Fa', 'GarageQual_Gd', 'GarageQual_Po', 'GarageQual_TA',
       'GarageCond_Fa', 'GarageCond_Gd', 'GarageCond_Po', 'GarageCond_TA'],
      dtype='object')
import statsmodels
import statsmodels.api as sm
from statsmodels.stats.outliers_influence import variance_inflation_factor
X_train_new=X_train[col]
X_train_new = sm.add_constant(X_train_new)

#create first model
lr=sm.OLS(y_train,X_train_new)

#fit the model
lr_model=lr.fit()

#Print the summary 
lr_model.summary()

Redge Regression
# list of alphas to tune


params = {'alpha': [0.0001, 0.001, 0.01, 0.05, 0.1, 
 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0, 
 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 20, 50, 100]}


ridge = Ridge()

# cross validation
folds = 5
model_cv = GridSearchCV(estimator = ridge, 
                        param_grid = params, 
                        scoring= 'neg_mean_absolute_error', 
                        cv = folds, 
                        return_train_score=True,
                        verbose = 1)            
model_cv.fit(X_train, y_train)
Fitting 5 folds for each of 26 candidates, totalling 130 fits
GridSearchCV(cv=5, estimator=Ridge(),
             param_grid={'alpha': [0.0001, 0.001, 0.01, 0.05, 0.1, 0.2, 0.3,
                                   0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0, 2.0, 3.0,
                                   4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 20, 50,
                                   100]},
             return_train_score=True, scoring='neg_mean_absolute_error',
             verbose=1)
print(model_cv.best_params_)
print(model_cv.best_score_)
{'alpha': 0.1}
-0.08888204140206633
cv_results = pd.DataFrame(model_cv.cv_results_)
cv_results = cv_results[cv_results['param_alpha']<=100]
cv_results

# plotting mean test and train scoes with alpha 
cv_results['param_alpha'] = cv_results['param_alpha'].astype('int32')
plt.figure(figsize=(16,5))

# plotting
plt.plot(cv_results['param_alpha'], cv_results['mean_train_score'])
plt.plot(cv_results['param_alpha'], cv_results['mean_test_score'])
plt.xlabel('alpha')
plt.ylabel('Negative Mean Absolute Error')
plt.title("Negative Mean Absolute Error and alpha")
plt.legend(['train score', 'test score'], loc='upper right')
plt.show()
Notebook Image
#final ridge model
alpha = 10
ridge = Ridge(alpha=alpha)

ridge.fit(X_train, y_train)
ridge.coef_
array([-2.38085026e-02,  1.75634227e-02,  0.00000000e+00,  8.04615652e-02,
        4.31367817e-02,  4.07086992e-02,  2.17211619e-02, -2.57778149e-03,
       -9.77783794e-04,  7.00415761e-03,  4.51716013e-03,  6.03030126e-03,
        0.00000000e+00,  3.69513293e-02,  4.21124128e-02,  1.77809813e-03,
        6.22615006e-02,  2.97695611e-02,  2.75399392e-03,  1.86436382e-02,
        1.15666866e-02,  1.18005677e-02, -1.28815662e-02,  1.63732548e-02,
        3.12275387e-03,  2.88670261e-06,  3.95658982e-02,  4.54806587e-03,
        0.00000000e+00,  1.34590878e-02, -2.88781233e-03,  9.69253650e-03,
        7.12342644e-03,  1.07041819e-02, -1.47512562e-02, -1.42341451e-04,
       -1.36109844e-03, -6.78297988e-03,  5.16997052e-02,  3.88806446e-02,
        6.70349088e-02,  1.94020866e-02,  2.10339812e-03,  2.64182554e-02,
        2.27327128e-02, -4.92681870e-02,  5.06950675e-03,  3.87953495e-02,
        4.26626055e-02,  4.80156887e-02, -1.22987288e-02,  3.83071439e-02,
       -3.32598162e-02, -5.52426903e-03, -5.47823043e-03,  2.53540290e-02,
       -7.83680526e-03, -3.84289083e-03, -2.22200649e-02,  1.75119148e-02,
        5.35585692e-02, -1.40964985e-02,  1.04957633e-01, -7.91649150e-02,
       -2.88576707e-02, -7.05846233e-02, -4.41282024e-02, -2.52501941e-02,
       -7.75544414e-03, -1.09260851e-02, -1.54478503e-02,  4.37180598e-02,
        7.97267916e-02, -2.49788234e-02,  1.57398453e-02, -2.80874624e-02,
       -7.35471784e-03,  6.90956346e-02,  6.03641515e-02, -1.70035271e-02,
        2.86241734e-02,  2.53831252e-03,  5.83873245e-02,  1.74023081e-02,
       -3.04891980e-02, -2.88387224e-02,  3.07415231e-02, -1.35302786e-03,
        6.26117423e-03,  1.86366442e-02,  3.70192900e-02,  2.37131251e-02,
       -9.53066214e-02, -3.43909586e-03, -1.66882777e-03,  9.54984929e-03,
        1.40320644e-02,  1.46247302e-02, -4.24924660e-02, -2.18213323e-03,
        3.56234524e-03,  2.90090624e-03, -1.09510894e-02,  1.62999459e-04,
       -3.12922711e-02,  3.21418067e-03, -3.65201541e-03, -2.40100267e-02,
        5.23573308e-04, -1.66842056e-02,  1.69750964e-02,  9.02682960e-03,
        3.09906238e-02,  7.13693860e-03,  4.73963040e-03,  1.05611794e-02,
        5.86303435e-03,  4.66785823e-03,  4.77265895e-02, -3.74490044e-03,
       -2.13135054e-02,  6.18500161e-02,  1.36604734e-03, -3.73329810e-03,
       -7.36336012e-03,  1.54819283e-03,  1.03847172e-02,  1.50226415e-02,
        3.06364856e-03, -1.62689510e-02,  2.39797003e-02, -1.14421961e-02,
        1.42074798e-02, -3.74490044e-03, -1.91496746e-02,  9.03392145e-03,
        1.36604734e-03,  1.24871088e-02,  1.62855361e-02,  2.33315023e-02,
        2.02126013e-02,  0.00000000e+00,  1.11067598e-02, -6.08793837e-04,
       -1.90280067e-02,  1.48998983e-02,  9.31514096e-03, -2.18833022e-02,
        1.24698561e-02,  7.16395112e-03,  1.66309944e-03, -1.91966186e-02,
        2.65303414e-02,  1.07999596e-02, -1.10189669e-02, -1.96169593e-02,
        0.00000000e+00,  1.00317429e-03,  2.30106844e-02,  3.80510938e-02,
        5.98319705e-03,  1.15507258e-02,  1.30639232e-03, -1.55685254e-02,
       -4.77912750e-02, -1.52432184e-02, -4.10843524e-02,  2.64101643e-02,
       -1.52432184e-02,  5.52233953e-03,  3.04732374e-02,  4.73620387e-02,
       -2.27378223e-03, -1.57017884e-02, -2.05201644e-02, -5.05562701e-03,
        1.49308760e-02, -1.56653928e-02, -1.52432184e-02, -2.05342485e-02,
       -4.72127558e-02, -2.69429539e-02,  4.70410780e-03,  4.66559480e-03,
       -1.40570124e-02, -1.15941196e-02,  1.98596956e-02,  1.31615873e-02,
        3.40529132e-02, -2.58878478e-02, -1.14352577e-02,  3.57597431e-03,
       -6.96386798e-03, -9.55911697e-03, -1.26903479e-02, -2.12464052e-02,
        9.74857477e-03, -4.30236620e-03,  0.00000000e+00,  0.00000000e+00,
        8.92971175e-03, -3.47421585e-02, -4.12675692e-02, -3.74672388e-02,
       -3.69206684e-02,  7.36336159e-03,  2.01175639e-02, -1.15931432e-02,
       -1.92620402e-02,  3.90061543e-02, -1.68890926e-03,  4.57247236e-03,
       -4.09154335e-02, -3.01010766e-02,  8.46192422e-04,  2.28773475e-02,
       -5.43172510e-03, -9.81882741e-03, -4.38902585e-03,  2.35974327e-03,
       -1.08343355e-03, -1.08343355e-03, -9.12266768e-03, -1.80250114e-02,
       -3.81983359e-02,  3.55315764e-02, -1.08343355e-03, -1.26226139e-03,
       -5.87801597e-03, -1.54766380e-02, -1.05399800e-02, -1.08343355e-03,
        6.86175843e-03,  9.34782273e-03,  2.54507826e-02, -1.11685854e-01,
        1.84340213e-02, -4.15121924e-02,  5.21650013e-03,  5.47745292e-03,
       -1.90639441e-04,  1.76112562e-02, -1.03832868e-02, -6.14335362e-03,
        0.00000000e+00,  1.93143180e-02,  1.55488591e-02,  5.27518760e-02,
        5.25442850e-04, -1.19773680e-03,  3.04590179e-02,  1.22604243e-02,
       -7.78495951e-03,  8.44756692e-03,  4.51306133e-03,  1.62464216e-02,
        4.84213326e-02,  3.04590179e-02])
#lets predict the R-squared value 
y_train_pred = ridge.predict(X_train)
print(metrics.r2_score(y_true=y_train, y_pred=y_train_pred))
0.9212269068065557
# Prediction on test set
y_test_pred = ridge.predict(X_test)
print(metrics.r2_score(y_true=y_test, y_pred=y_test_pred))
0.883797205474737
# Printing the RMSE value
mean_squared_error(y_test, y_test_pred)
0.0191200213520672
Lasso Regression
#lasso
params = {'alpha': [0.00005, 0.0001, 0.001, 0.008, 0.01]}
lasso = Lasso()

# cross validation
lasso_cv = GridSearchCV(estimator = lasso, 
                        param_grid = params, 
                        scoring= 'neg_mean_absolute_error', 
                        cv = folds, 
                        return_train_score=True,
                        verbose = 1)            

lasso_cv.fit(X_train, y_train)
Fitting 5 folds for each of 5 candidates, totalling 25 fits
GridSearchCV(cv=5, estimator=Lasso(),
             param_grid={'alpha': [5e-05, 0.0001, 0.001, 0.008, 0.01]},
             return_train_score=True, scoring='neg_mean_absolute_error',
             verbose=1)
cv_results_l = pd.DataFrame(lasso_cv.cv_results_)
print(lasso_cv.best_params_)
print(lasso_cv.best_score_)
{'alpha': 0.0001}
-0.08346738697359261
#final lasso model
alpha = 0.001

lasso = Lasso(alpha=alpha)
        
lasso.fit(X_train, y_train) 
Lasso(alpha=0.001)
#Predict the R-squared value for Train data
y_train_pred = lasso.predict(X_train)
print(metrics.r2_score(y_true=y_train, y_pred=y_train_pred))
0.9166555212910448
#Predict the R-squared value for test data

y_test_pred = lasso.predict(X_test)
print(metrics.r2_score(y_true=y_test, y_pred=y_test_pred))
0.8536281526856487
mean_squared_error(y_test, y_test_pred)
0.024084040813523458
lasso.coef_
array([-1.75634866e-02,  1.58475663e-02,  0.00000000e+00,  8.95868740e-02,
        4.79722973e-02,  5.84621697e-02,  2.30410409e-02,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  2.75277807e-02,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -2.00814730e-03,
        1.18807420e-01,  3.12434193e-02,  2.11452910e-03,  1.26692394e-02,
        9.17379695e-03,  5.51096212e-03, -1.23066024e-02,  1.40449551e-02,
        7.77203870e-03,  3.15938027e-06,  3.70663790e-02,  7.40517319e-03,
        0.00000000e+00,  1.29213805e-02, -6.96954111e-04,  6.95460429e-03,
        5.31359896e-03,  1.08168225e-02, -1.35454367e-02, -9.94734750e-04,
       -1.57186069e-03, -6.45623790e-03,  9.17408356e-03,  0.00000000e+00,
        5.44593886e-02, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  4.40478843e-03, -0.00000000e+00,  2.90794924e-02,
       -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,  7.53627739e-03,
        4.52628250e-02, -0.00000000e+00,  1.07972359e-01, -5.72026455e-02,
       -0.00000000e+00, -4.46403923e-02, -0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,  1.62853130e-02,
        7.52466238e-02, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  9.36114674e-02,  1.77863108e-02, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  5.53958673e-02,  0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -2.78966096e-01, -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00, -4.75867363e-02,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
       -1.53123922e-02, -0.00000000e+00, -0.00000000e+00, -9.38194081e-04,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  5.80559886e-02,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  7.58843599e-03, -1.54768172e-03,
        0.00000000e+00, -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,
        0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        1.71238413e-02,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  7.88298968e-03, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        1.03465606e-02, -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  9.37638435e-03,  0.00000000e+00,  2.46843478e-02,
       -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,  0.00000000e+00,
       -2.02603872e-02, -0.00000000e+00, -4.93523237e-03,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  1.21302965e-02,  4.97046414e-02,
        0.00000000e+00, -9.81370521e-03, -0.00000000e+00,  0.00000000e+00,
        1.14246285e-02, -0.00000000e+00, -0.00000000e+00, -3.32305003e-03,
       -3.42024961e-02, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00,  1.03943089e-02,  0.00000000e+00,
        0.00000000e+00, -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00, -0.00000000e+00, -1.30071481e-02,
       -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
        9.47870459e-04, -0.00000000e+00, -4.13191908e-03, -0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
       -0.00000000e+00,  2.73287598e-02,  0.00000000e+00,  5.19193873e-04,
       -2.98723139e-02, -0.00000000e+00, -0.00000000e+00,  2.04738622e-02,
       -0.00000000e+00, -0.00000000e+00, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00, -3.28279355e-04, -1.19674282e-02,
       -6.53187206e-03,  0.00000000e+00,  0.00000000e+00, -0.00000000e+00,
        0.00000000e+00, -0.00000000e+00, -0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00,  0.00000000e+00, -7.28011052e-01,
        0.00000000e+00, -1.92766524e-02,  0.00000000e+00,  0.00000000e+00,
        0.00000000e+00,  3.50204250e-03, -0.00000000e+00, -0.00000000e+00,
        0.00000000e+00,  0.00000000e+00,  0.00000000e+00,  0.00000000e+00,
       -0.00000000e+00, -0.00000000e+00,  3.58220479e-02,  0.00000000e+00,
       -0.00000000e+00,  0.00000000e+00, -0.00000000e+00,  0.00000000e+00,
        3.04547010e-02,  1.22993491e-02])
# plotting mean test and train scoes with alpha 
cv_results['param_alpha'] = cv_results['param_alpha'].astype('float32')

# plotting
plt.plot(cv_results_l['param_alpha'], cv_results_l['mean_train_score'])
plt.xlabel('alpha')
plt.ylabel('Negative Mean Absolute Error')

plt.title("Negative Mean Absolute Error and alpha")
plt.legend(['train score', 'test score'], loc='upper left')
plt.show()
Notebook Image
model_cv.best_params_
{'alpha': 0.1}
ridge = Ridge(alpha = 0.1)
ridge.fit(X_train,y_train)

y_pred_train = ridge.predict(X_train)
print(r2_score(y_train,y_pred_train))

y_pred_test = ridge.predict(X_test)
print(r2_score(y_test,y_pred_test))
0.957091869174661
0.7467659898941572
model_parameter = list(ridge.coef_)
model_parameter.insert(0,ridge.intercept_)
cols = df_train.columns
cols.insert(0,'constant')
ridge_coef = pd.DataFrame(list(zip(cols,model_parameter)))
ridge_coef.columns = ['Feaure','Coef']
ridge_coef.sort_values(by='Coef',ascending=False).head(10)

lasso = Lasso(alpha=0.001)
lasso.fit(X_train,y_train)

y_train_pred = lasso.predict(X_train)
y_test_pred = lasso.predict(X_test)

print(r2_score(y_true=y_train,y_pred=y_train_pred))
print(r2_score(y_true=y_test,y_pred=y_test_pred))
0.9166555212910448
0.8536281526856487
Best alpha value for Lasso : {'alpha': 0.001}
Best alpha value for Ridge : {'alpha': 0.1}

The most prominent features are:-

OverallCond
BsmtFullBath
ExterQual
BsmtCond
GarageArea
Functional
The optimal value of lambda for ridge and lasso regression are:-

Ridge - 4.0
Lasso - 0.01