# Regularization

## Why Regularize?

In an attempt to fit a good model to data, we often tend to overfit. Regularization discourages overly complex models by penalizing the loss function.

### The Bias-Variance Tradeoff

When we did Linear Regression, we briefly talked about the Bias-Variance Tradeoff.

![](http://scott.fortmann-roe.com/docs/docs/BiasVariance/biasvariance.png)

![](https://miro.medium.com/max/544/1*Y-yJiR0FzMgchPA-Fm5c1Q.jpeg)

**High bias** 

 - Systematic error in predictions (i.e. the average)
 - Bias is about the strength of assumptions the model makes
 - Underfit models tend to have high bias


**High variance**

 - The model is highly sensitive to changes in the data
 - Overfit models tend to have low bias and high variance
    
    
![](https://gblobscdn.gitbook.com/assets%2F-LvBP1svpACTB1R1x_U4%2F-LvNWUoWieQqaGmU_gl9%2F-LvNoby-llz4QzAK15nL%2Fimage.png?alt=media&token=41720ce9-bb66-4419-9bd8-640abf1fc415)

 - Underfit Models fail to capture all of the information in the data
 - Overfit models fit to the noise in the data and fail to generalize


**How would we know if our model is over or underfit?**
 - Train test split & look at the testing error
 - As model complexity increases so does the possibility for overfitting

## Ridge and Lasso

Ridge and Lasso regression are two examples of penalized estimation. Penalized estimation makes some or all of the coefficients smaller in magnitude (closer to zero). Some of the penalties have the property of performing both variable selection (setting some coefficients exactly equal to zero) and shrinking the other coefficients. 

In Ridge regression, the cost function is changed by adding a penalty term to the square of the magnitude of the coefficients. 

$$ \text{cost_function_ridge}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij})-b)^2 + \lambda \sum_{j=1}^p m_j^2$$

Lasso regression (Least Absolute Shrinkage and Selection Operator) is very similar to Ridge regression, except that the magnitude of the coefficients are not squared in the penalty term.

$$ \text{cost_function_lasso}= \sum_{i=1}^n(y_i - \hat{y})^2 = \sum_{i=1}^n(y_i - \sum_{j=1}^k(m_jx_{ij})-b)^2 + \lambda \sum_{j=1}^p \mid m_j \mid$$

So we're penalizing large coefficients -- what are the effects/implications of that?

### Standardization before Regularization

An important step before using either Lasso or Ridge regularization is to first standardize your data such that it is all on the same scale. Regularization is based on the concept of penalizing larger coefficients, so **if you have features that are on different scales, some will get unfairly penalized**. A downside of standardization is that the value of the coefficients become less interpretable and must be transformed back to their original scale if you want to interpret how a one unit change in a feature impacts the target variable.

**Scaler documentation:**

* https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html
* https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.StandardScaler.html

## Let's Code! 

Start with a regular Linear Regression.

In [1]:
import pandas as pd
import numpy as np

import matplotlib.pyplot as plt

from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error, r2_score
from sklearn.linear_model import LinearRegression
from sklearn.impute import SimpleImputer
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.preprocessing import OneHotEncoder

# import warnings
# warnings.filterwarnings('ignore')

In [2]:
df = pd.read_csv('data/ames_train.csv') # Ames housing data

# Drop sale detail columns 
df = df.drop(columns = ['Id', 'MoSold', 'YrSold', 'SaleType', 'SaleCondition'])

# Create X and y
y = df['SalePrice']
X = df.drop(columns=['SalePrice'], axis=1)

# Split data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

### Time to Clean/Process

In [3]:
# Explore X_train
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1095 entries, 1023 to 1126
Data columns (total 75 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   MSSubClass     1095 non-null   int64  
 1   MSZoning       1095 non-null   object 
 2   LotFrontage    895 non-null    float64
 3   LotArea        1095 non-null   int64  
 4   Street         1095 non-null   object 
 5   Alley          70 non-null     object 
 6   LotShape       1095 non-null   object 
 7   LandContour    1095 non-null   object 
 8   Utilities      1095 non-null   object 
 9   LotConfig      1095 non-null   object 
 10  LandSlope      1095 non-null   object 
 11  Neighborhood   1095 non-null   object 
 12  Condition1     1095 non-null   object 
 13  Condition2     1095 non-null   object 
 14  BldgType       1095 non-null   object 
 15  HouseStyle     1095 non-null   object 
 16  OverallQual    1095 non-null   int64  
 17  OverallCond    1095 non-null   int64  
 18  YearB

In [8]:
# Let's check the percentage of our training data that's null per column
null_perc = X_train.isna().sum()/ len(X_train)

In [12]:
null_perc.sort_values(ascending=False).head(10)

PoolQC          0.994521
MiscFeature     0.960731
Alley           0.936073
Fence           0.800913
FireplaceQu     0.467580
LotFrontage     0.182648
GarageQual      0.052968
GarageType      0.052968
GarageYrBlt     0.052968
GarageFinish    0.052968
dtype: float64

In [17]:
# Drop where nulls are more than 10% of column
null_cols = list(null_perc.loc[null_perc > .1].index)
print(null_cols)

X_train = X_train.drop(columns=null_cols)
X_test = X_test.drop(columns=null_cols)

['LotFrontage', 'Alley', 'FireplaceQu', 'PoolQC', 'Fence', 'MiscFeature']


In [18]:
X_train.columns

Index(['MSSubClass', 'MSZoning', 'LotArea', 'Street', 'LotShape',
       'LandContour', 'Utilities', 'LotConfig', 'LandSlope', 'Neighborhood',
       'Condition1', 'Condition2', 'BldgType', 'HouseStyle', 'OverallQual',
       'OverallCond', 'YearBuilt', 'YearRemodAdd', 'RoofStyle', 'RoofMatl',
       'Exterior1st', 'Exterior2nd', 'MasVnrType', 'MasVnrArea', 'ExterQual',
       'ExterCond', 'Foundation', 'BsmtQual', 'BsmtCond', 'BsmtExposure',
       'BsmtFinType1', 'BsmtFinSF1', 'BsmtFinType2', 'BsmtFinSF2', 'BsmtUnfSF',
       'TotalBsmtSF', 'Heating', 'HeatingQC', 'CentralAir', 'Electrical',
       '1stFlrSF', '2ndFlrSF', 'LowQualFinSF', 'GrLivArea', 'BsmtFullBath',
       'BsmtHalfBath', 'FullBath', 'HalfBath', 'BedroomAbvGr', 'KitchenAbvGr',
       'KitchenQual', 'TotRmsAbvGrd', 'Functional', 'Fireplaces', 'GarageType',
       'GarageYrBlt', 'GarageFinish', 'GarageCars', 'GarageArea', 'GarageQual',
       'GarageCond', 'PavedDrive', 'WoodDeckSF', 'OpenPorchSF',
       'EnclosedPorc

In [19]:
X_train['MSSubClass'].dtype

dtype('int64')

In [20]:
X_train.info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1095 entries, 1023 to 1126
Data columns (total 69 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   MSSubClass     1095 non-null   int64  
 1   MSZoning       1095 non-null   object 
 2   LotArea        1095 non-null   int64  
 3   Street         1095 non-null   object 
 4   LotShape       1095 non-null   object 
 5   LandContour    1095 non-null   object 
 6   Utilities      1095 non-null   object 
 7   LotConfig      1095 non-null   object 
 8   LandSlope      1095 non-null   object 
 9   Neighborhood   1095 non-null   object 
 10  Condition1     1095 non-null   object 
 11  Condition2     1095 non-null   object 
 12  BldgType       1095 non-null   object 
 13  HouseStyle     1095 non-null   object 
 14  OverallQual    1095 non-null   int64  
 15  OverallCond    1095 non-null   int64  
 16  YearBuilt      1095 non-null   int64  
 17  YearRemodAdd   1095 non-null   int64  
 18  RoofS

In [61]:
# Start with the continuous variables

# Grab only numeric features
num_cols = [col for col in X_train.columns if X_train[col].dtype in [np.int64, np.float64]]

X_train_cont = X_train[num_cols]
X_test_cont = X_test[num_cols]

# Impute missing values with 0 using SimpleImputer
# (most columns look like they just don't have details)
imputer = SimpleImputer(strategy = 'constant', fill_value = 0)

X_train_impute = imputer.fit_transform(X_train_cont)
X_test_impute = imputer.transform(X_test_cont)

# Scale the train and test data
scaler = MinMaxScaler()

X_train_scaled = scaler.fit_transform(X_train_impute)
X_test_scaled = scaler.transform(X_test_impute)

In [62]:
# Now time for the categorical columns

# Create X_cat which contains only the categorical variables
cat_cols = [col for col in X_train.columns if X_train[col].dtype in [np.object]]

X_train_cat = X_train[cat_cols]
X_test_cat = X_test[cat_cols]

# Fill missing values with the string 'missing'
cat_imputer = SimpleImputer(strategy = 'constant', fill_value = 'missing')

X_train_cat = pd.DataFrame(cat_imputer.fit_transform(X_train_cat), columns=cat_cols)
X_test_cat = pd.DataFrame(cat_imputer.transform(X_test_cat), columns=cat_cols)

In [63]:
X_train_cat['HouseStyle'].value_counts(normalize=True)[0]

0.4940639269406393

In [64]:
# Exploring column percentages

# Let's remove any column where the most common value is more than 90% of that col
low_variance_cols = []

for col in X_train_cat.columns:
    col_perc = X_train_cat[col].value_counts(normalize=True)
    display(col_perc)
    
    if col_perc[0] > .9:
        print(f"You should remove {col}")
        low_variance_cols.append(col)

RL         0.790868
RM         0.149772
FV         0.042922
RH         0.012785
C (all)    0.003653
Name: MSZoning, dtype: float64

Pave    0.996347
Grvl    0.003653
Name: Street, dtype: float64

You should remove Street


Reg    0.621918
IR1    0.338813
IR2    0.031963
IR3    0.007306
Name: LotShape, dtype: float64

Lvl    0.905936
Bnk    0.041096
HLS    0.029224
Low    0.023744
Name: LandContour, dtype: float64

You should remove LandContour


AllPub    0.999087
NoSeWa    0.000913
Name: Utilities, dtype: float64

You should remove Utilities


Inside     0.699543
Corner     0.190868
CulDSac    0.073059
FR2        0.033790
FR3        0.002740
Name: LotConfig, dtype: float64

Gtl    0.946119
Mod    0.045662
Sev    0.008219
Name: LandSlope, dtype: float64

You should remove LandSlope


NAmes      0.152511
CollgCr    0.102283
OldTown    0.079452
Edwards    0.075799
Somerst    0.056621
NWAmes     0.054795
Gilbert    0.053881
NridgHt    0.052968
Sawyer     0.046575
BrkSide    0.040183
SawyerW    0.036530
Crawfor    0.035616
Mitchel    0.034703
NoRidge    0.028311
Timber     0.024658
IDOTRR     0.022831
StoneBr    0.018265
ClearCr    0.017352
SWISU      0.016438
Blmngtn    0.013699
BrDale     0.011872
MeadowV    0.009132
Veenker    0.008219
NPkVill    0.006393
Blueste    0.000913
Name: Neighborhood, dtype: float64

Norm      0.863927
Feedr     0.053881
Artery    0.033790
RRAn      0.015525
PosN      0.012785
RRAe      0.009132
PosA      0.005479
RRNn      0.004566
RRNe      0.000913
Name: Condition1, dtype: float64

Norm      0.991781
Feedr     0.002740
PosN      0.001826
Artery    0.001826
RRAe      0.000913
RRAn      0.000913
Name: Condition2, dtype: float64

You should remove Condition2


1Fam      0.835616
TwnhsE    0.076712
Duplex    0.033790
Twnhs     0.029224
2fmCon    0.024658
Name: BldgType, dtype: float64

1Story    0.494064
2Story    0.309589
1.5Fin    0.104110
SLvl      0.045662
SFoyer    0.021005
1.5Unf    0.010046
2.5Unf    0.009132
2.5Fin    0.006393
Name: HouseStyle, dtype: float64

Gable      0.769863
Hip        0.205479
Flat       0.010046
Gambrel    0.008219
Mansard    0.004566
Shed       0.001826
Name: RoofStyle, dtype: float64

CompShg    0.982648
Tar&Grv    0.008219
WdShngl    0.003653
WdShake    0.002740
Roll       0.000913
Metal      0.000913
ClyTile    0.000913
Name: RoofMatl, dtype: float64

You should remove RoofMatl


VinylSd    0.359817
HdBoard    0.152511
MetalSd    0.146119
Wd Sdng    0.145205
Plywood    0.068493
CemntBd    0.038356
BrkFace    0.034703
Stucco     0.017352
WdShing    0.017352
AsbShng    0.014612
BrkComm    0.001826
ImStucc    0.000913
CBlock     0.000913
Stone      0.000913
AsphShn    0.000913
Name: Exterior1st, dtype: float64

VinylSd    0.351598
HdBoard    0.140639
Wd Sdng    0.140639
MetalSd    0.139726
Plywood    0.094064
CmentBd    0.037443
Wd Shng    0.029224
Stucco     0.019178
AsbShng    0.015525
BrkFace    0.013699
ImStucc    0.005479
Brk Cmn    0.005479
Stone      0.002740
AsphShn    0.002740
CBlock     0.000913
Other      0.000913
Name: Exterior2nd, dtype: float64

None       0.579909
BrkFace    0.315068
Stone      0.090411
BrkCmn     0.010959
missing    0.003653
Name: MasVnrType, dtype: float64

TA    0.622831
Gd    0.331507
Ex    0.035616
Fa    0.010046
Name: ExterQual, dtype: float64

TA    0.876712
Gd    0.098630
Fa    0.021918
Ex    0.001826
Po    0.000913
Name: ExterCond, dtype: float64

PConc     0.449315
CBlock    0.427397
BrkTil    0.098630
Slab      0.017352
Stone     0.004566
Wood      0.002740
Name: Foundation, dtype: float64

TA         0.442922
Gd         0.421005
Ex         0.085845
Fa         0.025571
missing    0.024658
Name: BsmtQual, dtype: float64

TA         0.892237
Gd         0.047489
Fa         0.034703
missing    0.024658
Po         0.000913
Name: BsmtCond, dtype: float64

No         0.656621
Av         0.148858
Gd         0.089498
Mn         0.080365
missing    0.024658
Name: BsmtExposure, dtype: float64

Unf        0.287671
GLQ        0.285845
ALQ        0.155251
BLQ        0.102283
Rec        0.092237
LwQ        0.052055
missing    0.024658
Name: BsmtFinType1, dtype: float64

Unf        0.863927
Rec        0.038356
LwQ        0.030137
missing    0.024658
BLQ        0.018265
ALQ        0.015525
GLQ        0.009132
Name: BsmtFinType2, dtype: float64

GasA     0.977169
GasW     0.013699
Grav     0.003653
Wall     0.002740
OthW     0.001826
Floor    0.000913
Name: Heating, dtype: float64

You should remove Heating


Ex    0.506849
TA    0.291324
Gd    0.165297
Fa    0.035616
Po    0.000913
Name: HeatingQC, dtype: float64

Y    0.928767
N    0.071233
Name: CentralAir, dtype: float64

You should remove CentralAir


SBrkr      0.915068
FuseA      0.060274
FuseF      0.021005
FuseP      0.002740
missing    0.000913
Name: Electrical, dtype: float64

You should remove Electrical


TA    0.502283
Gd    0.401826
Ex    0.066667
Fa    0.029224
Name: KitchenQual, dtype: float64

Typ     0.925114
Min2    0.025571
Min1    0.024658
Mod     0.011872
Maj1    0.008219
Maj2    0.003653
Sev     0.000913
Name: Functional, dtype: float64

You should remove Functional


Attchd     0.594521
Detchd     0.263927
BuiltIn    0.063014
missing    0.052968
Basment    0.013699
CarPort    0.006393
2Types     0.005479
Name: GarageType, dtype: float64

Unf        0.410046
RFn        0.293151
Fin        0.243836
missing    0.052968
Name: GarageFinish, dtype: float64

TA         0.900457
missing    0.052968
Fa         0.031963
Gd         0.010959
Ex         0.002740
Po         0.000913
Name: GarageQual, dtype: float64

You should remove GarageQual


TA         0.908676
missing    0.052968
Fa         0.024658
Gd         0.008219
Po         0.003653
Ex         0.001826
Name: GarageCond, dtype: float64

You should remove GarageCond


Y    0.916895
N    0.061187
P    0.021918
Name: PavedDrive, dtype: float64

You should remove PavedDrive


In [65]:
low_variance_cols

['Street',
 'LandContour',
 'Utilities',
 'LandSlope',
 'Condition2',
 'RoofMatl',
 'Heating',
 'CentralAir',
 'Electrical',
 'Functional',
 'GarageQual',
 'GarageCond',
 'PavedDrive']

In [66]:
# Now drop those
X_train_cat = X_train_cat.drop(columns = low_variance_cols)

In [67]:
X_test_cat = X_test_cat.drop(columns = low_variance_cols)

In [68]:
# OneHotEncode categorical variables
ohe = OneHotEncoder(handle_unknown='ignore')

X_train_ohe = ohe.fit_transform(X_train_cat)
X_test_ohe = ohe.transform(X_test_cat)

# Convert these columns into a DataFrame 
ohe_col_names = ohe.get_feature_names(input_features=X_train_cat.columns)
cat_train_df = pd.DataFrame(X_train_ohe.todense(), columns=ohe_col_names)
cat_test_df = pd.DataFrame(X_test_ohe.todense(), columns=ohe_col_names)

In [69]:
len(num_cols)

33

In [70]:
# Put it all back together
train_df = pd.concat([pd.DataFrame(X_train_scaled, columns = num_cols), cat_train_df], axis=1)
test_df = pd.concat([pd.DataFrame(X_test_scaled, columns = num_cols), cat_test_df], axis=1)

# Fit the model
linreg = LinearRegression()

linreg.fit(train_df, y_train)

LinearRegression()

In [71]:
# Write a quick evaluation function
def evaluate(train_actual, train_predicted, test_actual, test_predicted):
    '''
    Takes in both actual and predicted values, for the train and test set
    Then prints the scores based on those values
    
    Inputs:
    -------
    train_actual - actual target values for the train set
    train_predicted - predicted target values for the train set
    test_actual - actual target values for the test set
    test_predicted - predicted target values for the test set
    '''
    print('Train R2:', r2_score(train_actual, train_predicted))
    print('Test R2:', r2_score(test_actual, test_predicted))
    print("*****")
    print('Train MSE:', mean_squared_error(train_actual, train_predicted))
    print('Test MSE:', mean_squared_error(test_actual, test_predicted))
    print("*****")
    print('Train RMSE:', mean_squared_error(train_actual, train_predicted, squared=False))
    print('Test RMSE:', mean_squared_error(test_actual, test_predicted, squared=False))

In [72]:
# Grab predictions and evaluate
train_preds = linreg.predict(train_df)
test_preds = linreg.predict(test_df)


evaluate(y_train, train_preds, y_test, test_preds)

Train R2: 0.8902021216223257
Test R2: -2.2071686650036666e+20
*****
Train MSE: 666631794.0310502
Test MSE: 1.546189862898988e+30
*****
Train RMSE: 25819.213660199843
Test RMSE: 1243458830399699.2


In [73]:
# Plot residuals?


In [75]:
# Explore coefficients
linreg.coef_

array([ 2.42939331e+04,  6.92941854e+04,  7.42320937e+04,  4.11704137e+04,
        2.65894826e+04,  6.05349865e+03,  4.75400935e+03,  4.76783702e+16,
        9.52046833e+15,  1.97336415e+16, -5.16149614e+16, -1.61448744e+16,
       -7.65010682e+15, -2.11906107e+15,  1.96642940e+16,  2.48520315e+04,
        2.27031926e+02,  2.91072310e+04,  8.35419678e+03, -8.27150894e+03,
       -5.42973472e+04,  3.98528927e+04,  1.75974960e+04, -2.04897988e+04,
        5.24989700e+04,  6.84785221e+02,  1.86865206e+04, -4.18988285e+03,
        8.94510617e+03,  3.56752782e+04,  2.12719550e+04,  5.93325235e+03,
       -2.80714220e+04, -1.24604900e+16, -1.24604900e+16, -1.24604900e+16,
       -1.24604900e+16, -1.24604900e+16, -2.51038140e+15, -2.51038139e+15,
       -2.51038140e+15, -2.51038140e+15, -6.66258627e+14, -6.66258627e+14,
       -6.66258627e+14, -6.66258627e+14, -6.66258627e+14,  3.41892602e+14,
        3.41892602e+14,  3.41892602e+14,  3.41892602e+14,  3.41892602e+14,
        3.41892602e+14,  

**Evaluate**

- 


In [None]:
# Let's wrap up that coefficient exploration in a function


## Fitting Ridge and Lasso

* https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Lasso.html
* https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.Ridge.html

### LASSO

In [76]:
from sklearn.linear_model import Lasso

lasso = Lasso() # Lasso is also known as the L1 norm 

# Fit
lasso.fit(train_df, y_train)

# Predict
train_preds = lasso.predict(train_df)
test_preds = lasso.predict(test_df)

# Evaluate
evaluate(y_train, train_preds, y_test, test_preds)

Train R2: 0.8901917658981585
Test R2: 0.8615905414873004
*****
Train MSE: 666694668.2421196
Test MSE: 969601032.6483977
*****
Train RMSE: 25820.43121719929
Test RMSE: 31138.41731123144


  model = cd_fast.enet_coordinate_descent(


In [79]:
dict(zip(train_df.columns, lasso.coef_))

{'MSSubClass': 23045.749091753994,
 'LotArea': 69153.836145813,
 'OverallQual': 74595.38206321676,
 'OverallCond': 41107.981108843065,
 'YearBuilt': 26220.382623718528,
 'YearRemodAdd': 5988.335765107996,
 'MasVnrArea': 4648.09022597526,
 'BsmtFinSF1': 0.0,
 'BsmtFinSF2': 4849.4559395726865,
 'BsmtUnfSF': 7777.920844871131,
 'TotalBsmtSF': 1741.57472139853,
 '1stFlrSF': 63604.76042320385,
 '2ndFlrSF': 96895.97584864669,
 'LowQualFinSF': 27417.491610494522,
 'GrLivArea': 62884.56196032811,
 'BsmtFullBath': 24805.356915284305,
 'BsmtHalfBath': 104.46421495761523,
 'FullBath': 28909.390378450902,
 'HalfBath': 8301.817433378845,
 'BedroomAbvGr': -7690.268164069751,
 'KitchenAbvGr': -53379.47417252352,
 'TotRmsAbvGrd': 39514.12138214492,
 'Fireplaces': 17680.89518126145,
 'GarageYrBlt': -3530.872923625518,
 'GarageCars': 52442.48298413535,
 'GarageArea': 149.46957727470695,
 'WoodDeckSF': 18480.494824553472,
 'OpenPorchSF': -4251.982556783525,
 'EnclosedPorch': 8603.840348301603,
 '3SsnPorc

In [84]:
# Adjust HYPERPARAMETERS -- check documentation!
lasso_v2 = Lasso(alpha=100) # Lasso is also known as the L1 norm 

# Fit
lasso_v2.fit(train_df, y_train)

# Predict
train_preds = lasso_v2.predict(train_df)
test_preds = lasso_v2.predict(test_df)

# Evaluate
evaluate(y_train, train_preds, y_test, test_preds)

Train R2: 0.8789951036421941
Test R2: 0.8744740298069457
*****
Train MSE: 734674588.7755469
Test MSE: 879348215.2970793
*****
Train RMSE: 27104.881272116778
Test RMSE: 29653.80608449916


In [87]:
# Check Lasso Coefficients
dict(zip(train_df.columns, lasso_v2.coef_))

{'MSSubClass': -0.0,
 'LotArea': 24923.94772732408,
 'OverallQual': 85954.33339465312,
 'OverallCond': 29673.53503167045,
 'YearBuilt': 14292.093234864264,
 'YearRemodAdd': 6132.078140285386,
 'MasVnrArea': 0.0,
 'BsmtFinSF1': 0.0,
 'BsmtFinSF2': 0.0,
 'BsmtUnfSF': 1794.0136831131008,
 'TotalBsmtSF': 0.0,
 '1stFlrSF': 0.0,
 '2ndFlrSF': 36365.147505163244,
 'LowQualFinSF': 0.0,
 'GrLivArea': 141324.8708247482,
 'BsmtFullBath': 21861.94646721838,
 'BsmtHalfBath': 0.0,
 'FullBath': 24157.87296652644,
 'HalfBath': 7269.6917299883635,
 'BedroomAbvGr': 0.0,
 'KitchenAbvGr': -0.0,
 'TotRmsAbvGrd': 30292.337711600027,
 'Fireplaces': 19206.57206481014,
 'GarageYrBlt': -0.0,
 'GarageCars': 48174.14738900044,
 'GarageArea': 0.0,
 'WoodDeckSF': 13362.849577853345,
 'OpenPorchSF': -0.0,
 'EnclosedPorch': -0.0,
 '3SsnPorch': 1352.8826606333375,
 'ScreenPorch': 13759.255663387308,
 'PoolArea': 0.0,
 'MiscVal': -0.0,
 'MSZoning_C (all)': -0.0,
 'MSZoning_FV': 933.3322023920897,
 'MSZoning_RH': -0.0,
 

### Ridge

In [86]:
from sklearn.linear_model import Ridge

ridge = Ridge() # Ridge is also known as the L2 norm

# Fit
ridge.fit(train_df, y_train)
# Predict
train_preds = ridge.predict(train_df)
test_preds = ridge.predict(test_df)

# Evaluate
evaluate(y_train, train_preds, y_test, test_preds)

Train R2: 0.8887013079926785
Test R2: 0.8668141030170736
*****
Train MSE: 675743901.6347739
Test MSE: 933008369.7784265
*****
Train RMSE: 25995.07456490121
Test RMSE: 30545.185705417254


In [95]:
max(linreg.coef_)

4.767837024023809e+16

In [94]:
max(dict(zip(train_df.columns, ridge.coef_)).values())

75202.13034289141

In [105]:
# Adjust HYPERPARAMETERS
ridge_v2 = Ridge(alpha=10) # Ridge is also known as the L2 norm

# Fit
ridge_v2.fit(train_df, y_train)
# Predict
train_preds = ridge_v2.predict(train_df)
test_preds = ridge_v2.predict(test_df)

# Evaluate
evaluate(y_train, train_preds, y_test, test_preds)

Train R2: 0.8741288928408802
Test R2: 0.8610739032837424
*****
Train MSE: 764219520.6498665
Test MSE: 973220242.932567
*****
Train RMSE: 27644.520626154226
Test RMSE: 31196.478053340685


In [106]:
# Check Ridge Coefficients
max(ridge_v2.coef_)

47009.208354857605

### Let's Discuss

- 


## Ridge & Lasso: Other benefits

### Ridge:
* We can "shrink down" prediction variables effects instead of deleting/zeroing them
* When you have features with high multicollinearity, the coefficients are automatically spread across them (you won't have redundancy)
* Since includes all features it can be computationally expensive (for many variables)

### Lasso:
* When you have a lot of variables it performs feature selection for you!
* Multicollinearity is also dealt with


### Por que no los dos??

Enter ElasticNet: https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.ElasticNet.html