# Regularization of Linear Models with SKLearn
This exercise is about regularization. The first part will be using SKLearn. 
 You have earlier in this course seen that increasing the order of a polynomial may result in overfitting. This exercise is about investigating the impact on regularization and how this relates to polynomial fitting of data.   


Let’s import the necessary libraries and load the training dataset.


In [15]:
#imports
import numpy as np
import pandas as pd
import math
import warnings

warnings.filterwarnings("ignore")

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LinearRegression
from sklearn.linear_model import Ridge
from sklearn.linear_model import Lasso
from sklearn.linear_model import ElasticNet
from sklearn.metrics import mean_squared_error

from sklearn.preprocessing import PolynomialFeatures
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler

import matplotlib.pyplot as plt
import seaborn as sns

sns.set()
%matplotlib inline


The next step is to split the dataset into a training set and a validation set. 30% of the data will be used for validation. You will pass an int to "random_state" in "train_test_split" function for reproducible output across multiple function calls.


In [16]:
data_url = "http://lib.stat.cmu.edu/datasets/boston"
raw_df = pd.read_csv(data_url, sep="\s+", skiprows=22, header=None)
X = np.hstack([raw_df.values[::2, :], raw_df.values[1::2, :2]])
y = raw_df.values[1::2, 2]
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=0.3)


This exercise uses a linear regression model as baseline.


In [17]:

lr_model = LinearRegression()
lr_model.fit(X_train, y_train)

print('Training score: {}'.format(lr_model.score(X_train, y_train)))
print('Test score: {}'.format(lr_model.score(X_test, y_test)))

y_pred_train = lr_model.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)

print('RMSE_train: {}'.format(rmse_train))

y_pred_test = lr_model.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)

print('RMSE_test: {}'.format(rmse_test))


Training score: 0.7434997532004697
Test score: 0.7112260057484974
RMSE_train: 4.748208239685937
RMSE_test: 4.638689926172788


The linear model obtains a training accuracy and a test accuracy around 72%-74% and an RMSE of about 4.5. 

The next step is to fit the data to the models using the "steps" within a  pipeline by scaling the data, then create polynomial models, and then train a linear regression model.

The first step is to normalize the inputs by  mean centering and scaling to unit variance. This serves the purpose of letting us work with reasonable numbers when we raise to a power. 






In [18]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', LinearRegression())
]

pipeline = Pipeline(steps)

pipeline.fit(X_train, y_train)

y_pred_train = pipeline.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = pipeline.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}\n'.format(rmse_test))

print('Training score: {}'.format(pipeline.score(X_train, y_train)))
print('Test score: {}'.format(pipeline.score(X_test, y_test)))


RMSE_train: 2.162970056950185
RMSE_test: 5.0811876783255245

Training score: 0.9467733311147442
Test score: 0.6535042863861226


After running the code, you will get a training accuracy of about 94%, and a test accuracy of 65%. This is a sign of overfitting. It is normally not a desirable feature, but that is exactly what we were hoping for this example. 

You will now apply regularization to the data.
## l2 Regularization or Ridge Regression
 Recall what happens when the model coefficients are learned during gradient descent. The weights are updated  using the learning rate and the gradient as mentioned in the lecture about non-linear optimization. Ridge regression adds a penalty term in objective function,

${\begin{align*}\frac{1}{2} \sum_{n=1}^{N}\left\{y_{n}-\theta^{\top} \boldsymbol{\phi}\left(\mathbf{x}_{n}\right)\right\}^{2}+\frac{\alpha}{2}\|\theta\|_2^2
\end{align*}}$

 The importance of the regularization  term, can be tuned by changing $\alpha$. The larger the value of $\alpha$, the less variance (variability of model prediction for a given data point)
your model will exhibit.


In [19]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Ridge(alpha=10, fit_intercept=True))
]

ridge_pipe = Pipeline(steps)
ridge_pipe.fit(X_train, y_train)

y_pred_train = ridge_pipe.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = ridge_pipe.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}'.format(rmse_test))

print('Training Score: {}'.format(ridge_pipe.score(X_train, y_train)))
print('Test Score: {}'.format(ridge_pipe.score(X_test, y_test)))


RMSE_train: 2.441071076959751
RMSE_test: 3.823376123713985
Training Score: 0.9322063334864212
Test Score: 0.8038169683868278


The regression model achives a training accuracy of about 92%, and a test accuracy of about 80%. That is an improvement compared to the baseline linear regression model.

## l1 Regularization or Lasso Regression
A a pipeline similarly to the Ridge regression example is created, but this time using Lasso. The objective function for Lasso regression is using the 1-norm on the parameters.

${\frac{1}{2} \sum_{n=1}^{N}\left\{y_{n}-\theta^{\top} \boldsymbol{\phi}\left(\mathbf{x}_{n}\right)\right\}^{2}+\frac{\alpha}{2}\|\theta\|_1 }$


In [20]:
steps = [
    ('scalar', StandardScaler()),
    ('poly', PolynomialFeatures(degree=2)),
    ('model', Lasso(alpha=0.3, fit_intercept=True))
]

lasso_pipe = Pipeline(steps)

lasso_pipe.fit(X_train, y_train)

y_pred_train = lasso_pipe.predict(X_train)
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = lasso_pipe.predict(X_test)
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}'.format(rmse_test))

print('Training score: {}'.format(lasso_pipe.score(X_train, y_train)))
print('Test score: {}'.format(lasso_pipe.score(X_test, y_test)))


RMSE_train: 3.538738418298479
RMSE_test: 3.970165571442558
Training score: 0.8575294192309941
Test score: 0.7884638325042947


# tasks :
In Exercise week 9 task 3, you were supposed to find the optimal  polynomial model. In this task you have to use that polynomial model and then extend it with ridge and and lasso regression:
1. Calculate the RMSE of the training and test sets (as you did in Exercise week 9) for the "optimal" polynomial
2. Use  **ridge regression** to estimate a polynomial of degree 10 and and calculate the RMSE on the trainig and test sets
3. Use  the model in question 2 to find the optimal $\alpha$ (around 0.0001) using RMSE
4. Use the same polynomial but this time  apply **Lasso regression** to find the optimal value of $\alpha$ (around 0.001) using  RMSE on the training and test sets.
5. The experiments shows that regularization (ridge regression and Lasso regression)  performs better when using the RMSE on the test set. You remember that higher degrees terms (more complexity) lead the model to overfitting. Thus, Why does the regularized model (which uses polynimial with higher degree) perform better on test set(better from overfitting view)?  

Optimal model Polynomial:
$w_0+w_1X+w_2X^2+w_3X^3$

Regulrized model Polynomial:
$wr_0+wr_1X+wr_2X^2+wr_3X^3+...+wr_{10}X^{10}$


In [21]:
#use this code for your solutions 
from sklearn.model_selection import train_test_split

#cosin function
def true_fun(X):
    return np.cos(1.5 * np.pi * X)

np.random.seed(0)

n_samples = 30

X = np.sort(np.random.rand(n_samples))
y = true_fun(X) + np.random.randn(n_samples) * 0.1


X_train, X_test, y_train, y_test = train_test_split(X, y, train_size=0.60, test_size=0.40, random_state=1)
print('Train/Test Size : ', X_train.shape, X_test.shape, y_train.shape, y_test.shape)
degree=10
polynomial_features = PolynomialFeatures(degree=degree, include_bias=False)
linear_regression = LinearRegression()
pipeline = Pipeline(
    [
        ("polynomial_features", polynomial_features),
        ("linear_regression", linear_regression),
    ]
)
pipeline.fit(X_train[:, np.newaxis], y_train)

y_pred_train = pipeline.predict(X_train[:, np.newaxis])
mse_train= mean_squared_error(y_train, y_pred_train)
rmse_train = math.sqrt(mse_train)
print('RMSE_train: {}'.format(rmse_train))

y_pred_test = pipeline.predict(X_test[:, np.newaxis])
mse_test = mean_squared_error(y_test, y_pred_test)
rmse_test = math.sqrt(mse_test)
print('RMSE_test: {}\n'.format(rmse_test))


Train/Test Size :  (18,) (12,) (18,) (12,)
RMSE_train: 0.056532247455264445
RMSE_test: 0.30856057840599427



In [36]:
# Copy paste your  code here


Train/Test Size :  (18,) (12,) (18,) (12,)
RMSE_train: 0.1079839948207335
RMSE_test: 0.12863714132664852



In [37]:
# Copy paste your  code here


RMSE_train: 0.10126814910229279
RMSE_test: 0.11689857032619674



In [38]:
# Copy paste your code here


RMSE_train: 0.10041317401921453
RMSE_test: 0.11953621236626978

