# Regularization with SciKit-Learn

Regularization attempts to minimize the RSS (residual sum of squares) *and* a penalty factor. This penalty factor will penalize models that have coefficients that are too large. Some methods of regularization will actually cause non useful features to have a coefficient of zero, in which case the model does not consider the feature.

Let's explore two methods of regularization, Ridge Regression and Lasso. We'll combine these with the polynomial feature set (it wouldn't be as effective to perform regularization of a model on such a small original feature set of the original X).

## Imports

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

In [2]:
from google.colab import drive 
drive.mount('/content/drive')

Mounted at /content/drive


## Data and Setup

In [3]:
df = pd.read_csv("/content/drive/MyDrive/Datasets/Advertising.csv")
X = df.drop('sales',axis=1)
y = df['sales']

### Train | Test Split

In [4]:
from sklearn.model_selection import train_test_split

In [8]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

----
----

## Scaling the Data

While our particular data set has all the values in the same order of magnitude ($1000s of dollars spent), typically that won't be the case on a dataset, and since the mathematics behind regularized models will sum coefficients together, its important to standardize the features. Review the theory videos for more info, as well as a discussion on why we only **fit** to the training data, and **transform** on both sets separately.

In [9]:
from sklearn.preprocessing import StandardScaler

In [10]:
scaler = StandardScaler()

In [11]:
scaler.fit(X_train)

StandardScaler()

In [12]:
X_train = scaler.transform(X_train)

In [13]:
X_test = scaler.transform(X_test)

## Ridge Regression

In [14]:
from sklearn.linear_model import Ridge

In [15]:
ridge_model = Ridge(alpha=10)

In [16]:
ridge_model.fit(X_train,y_train)

Ridge(alpha=10)

In [17]:
test_predictions = ridge_model.predict(X_test)

In [18]:
from sklearn.metrics import mean_absolute_error,mean_squared_error
MAE = mean_absolute_error(y_test,test_predictions)
MSE = mean_squared_error(y_test,test_predictions)
RMSE = np.sqrt(MSE)



In [19]:
MAE

1.2587692456910873

In [20]:
MSE

2.582186472392257

In [21]:
RMSE

1.6069183154075557

### Choosing an alpha value with Cross-Validation

In [22]:
from sklearn.linear_model import RidgeCV

In [23]:
ridge_cv_model = RidgeCV(alphas=(0.1, 1.0, 10.0),scoring='neg_mean_absolute_error')

In [24]:
ridge_cv_model.fit(X_train,y_train)

RidgeCV(alphas=array([ 0.1,  1. , 10. ]), scoring='neg_mean_absolute_error')

In [25]:
ridge_cv_model.alpha_

0.1

In [26]:
test_predictions = ridge_cv_model.predict(X_test)

In [27]:
MAE = mean_absolute_error(y_test,test_predictions)
MSE = mean_squared_error(y_test,test_predictions)
RMSE = np.sqrt(MSE)

In [28]:
MAE

1.2140617628441148

In [29]:
MSE

2.3006527003556045

In [30]:
RMSE

1.5167902624804803

In [31]:
ridge_cv_model.coef_

array([ 3.76333559,  2.76339947, -0.00593674])