# More Regularization

Term 1 2020 - Instructor: Teerapong Leelanupab

Teaching Assistant: 
1. Tiwipab Meephruek (Mil)
2. Jiratkul Wangsiripaisarn (Brooklyn)
3. Hataichanok Sakkara (Pond)

***

## Importing libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn import model_selection, preprocessing
from sklearn.linear_model import LinearRegression, Ridge, Lasso, RidgeCV, LassoCV
from sklearn.metrics import mean_squared_error

## Generating data

In [None]:
np.random.seed(42)
x = np.sort(np.random.rand(100))
y = np.cos(1.2 * x * np.pi) + (0.1 * np.random.randn(100))

- The random numbers are added to y so as to generate noise, since the real world data wont be fitting a line exactly and will have some noise.

In [None]:
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(x, y, test_size = 0.2)

- Total data is split into train and test data. Test data is 20% of entire data.

## Plotting the generated data

In [None]:
fig = plt.figure(figsize = (10,10))
sns.set(style = 'whitegrid')
plt.scatter(X_train, Y_train, color = 'k', label = 'Train data')
plt.scatter(X_test, Y_test, color = 'r', label = 'Test data')
plt.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True-fit')
plt.xlabel('x')
plt.ylabel('y')
plt.legend()

- The train data and test data is plotted.
- The true fit line is the actual function from which we generated the data.

## Using an equation of degree 1.

- I will be using an hyothesis made from an equatio of degree 1 for the model.

In [None]:
x_train = X_train.reshape(-1,1)
clf = LinearRegression()
clf.fit(x_train, Y_train)
train_accuracy = clf.score(x_train, Y_train)
print('train accuracy', train_accuracy)

In [None]:
x_test = X_test.reshape(-1,1)
test_accuracy = clf.score(x_test, Y_test)
print('test accuracy', test_accuracy)

In [None]:
train_predict = clf.predict(x_train)
train_MSE = mean_squared_error(Y_train, train_predict)
print('Training MSE:', train_MSE)

In [None]:
test_predict = clf.predict(x_test)
test_MSE = mean_squared_error(Y_test, test_predict)
print('Test MSE:', test_MSE)

In [None]:
fig = plt.figure(figsize = (20,10))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax1.plot(X_test, test_predict, label = 'Model function' )
ax1.legend()
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
plt.legend()
ax2.set_xlabel('x')
ax2.set_ylabel('y')

- The model is under-fit as it is not able to fit the training examples correctly. 
- The training and testing error are also quite high.
- Increasing the degree of the equation may do the job.

## Using an equation of degree 2

In [None]:
x_train = X_train.reshape(-1,1)
transf = preprocessing.PolynomialFeatures()
x_train = transf.fit_transform(x_train)
clf = LinearRegression()
clf.fit(x_train, Y_train)
train_accuracy = clf.score(x_train, Y_train)
print('Train accuracy:', train_accuracy)

In [None]:
x_test = X_test.reshape(-1,1)
transf = preprocessing.PolynomialFeatures()
x_test = transf.fit_transform(x_test)
test_accuracy = clf.score(x_test, Y_test)
print('test accuracy', test_accuracy)

In [None]:
train_predict = clf.predict(x_train)
train_MSE = mean_squared_error(Y_train, train_predict)
print('Training MSE:', train_MSE)

In [None]:
test_predict = clf.predict(x_test)
test_MSE = mean_squared_error(Y_test, test_predict)
print('Test MSE:', test_MSE)

In [None]:
x_model = x.reshape(-1,1)
x_model = transf.fit_transform(x_model)
y_model = clf.predict(x_model)
x_test = X_test

In [None]:
fig = plt.figure(figsize = (20,10))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax1.plot(x, y_model, label = 'Model function', linewidth = 3 )
ax1.legend()
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
plt.legend()
ax2.set_xlabel('x')
ax2.set_ylabel('y')
plt.show()

- 2nd degree equation fits the data better when compared to 1st degree equation.
- The errors have been reduced significantly.

## Using an equation of degree 20

In [None]:
x_train = X_train.reshape(-1,1)
transf = preprocessing.PolynomialFeatures(degree = 20)
x_train = transf.fit_transform(x_train)
clf = LinearRegression()
clf.fit(x_train, Y_train)
train_accuracy = clf.score(x_train, Y_train)
print('Train accuracy:', train_accuracy)

In [None]:
x_test = X_test.reshape(-1,1)
transf = preprocessing.PolynomialFeatures(degree = 20)
x_test = transf.fit_transform(x_test)
test_accuracy = clf.score(x_test, Y_test)
print('test accuracy', test_accuracy)

In [None]:
train_predict = clf.predict(x_train)
train_MSE = mean_squared_error(Y_train, train_predict)
print('Training MSE:', train_MSE)

In [None]:
test_predict = clf.predict(x_test)
test_MSE = mean_squared_error(Y_test, test_predict)
print('Test MSE:', test_MSE)

In [None]:
x_model = x.reshape(-1,1)
x_model = transf.fit_transform(x_model)
y_model = clf.predict(x_model)
x_test = X_test

In [None]:
fig = plt.figure(figsize = (20,10))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax1.plot(x, y_model, label = 'Model function', linewidth = 3 )
ax1.legend()
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
plt.legend()
ax2.set_xlabel('x')
ax2.set_ylabel('y')

- The model function is very much distorted as its degree is high which makes it flexible enough to try to pass thorugh all the training examples.
- As the training data has some amount of noise, it will end up capturing that noise and will be misled by that noise when it tries to make predictions on the test data.

- The optimum model would be the one fitting the data without under-fitting or over-fitting it.
- Such an optimal model can be decided from the scores of Cross-validation and also by checking the MSE of the models using equations of different degrees.

## Checking MSE for different degrees

In [None]:
def Evaluation(degree):
    x_train = X_train.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = degree)
    x_train = transf.fit_transform(x_train)
    clf = LinearRegression()
    clf.fit(x_train, Y_train)
    train_accuracy = clf.score(x_train, Y_train)
    x_test = X_test.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = degree)
    x_test = transf.fit_transform(x_test)
    test_accuracy = clf.score(x_test, Y_test)
    train_predict = clf.predict(x_train)
    train_MSE = mean_squared_error(Y_train, train_predict)
    test_predict = clf.predict(x_test)
    test_MSE = mean_squared_error(Y_test, test_predict)
    return train_accuracy, test_accuracy, train_MSE, test_MSE

In [None]:
Train_acc = []
Test_acc = []
Train_MSE = []
Test_MSE = []

In [None]:
for i in range(40):
    a, b, c, d = Evaluation(i+1)
    Train_acc.append(a)
    Test_acc.append(b)
    Train_MSE.append(c)
    Test_MSE.append(d)

In [None]:
degrees = np.linspace(1, 40, 40)
fig = plt.figure(figsize = (20,10))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.plot(degrees, Train_acc, label = 'Training accuracy', linewidth = 3)
ax1.plot(degrees, Test_acc, label = 'Testing accuracy', linewidth = 3)
ax1.legend()
ax1.set_xlabel('Degrees')
ax1.set_ylabel('Accuracy')
ax2.plot(degrees, Train_MSE, label = 'Training MSE', linewidth = 3)
ax2.plot(degrees, Test_MSE, label = 'Testing MSE', linewidth = 3)
plt.ylim(0, 0.05)
plt.legend()
ax2.set_xlabel('Degrees')
ax2.set_ylabel('MSE')

In [None]:
Test_min_degree = Test_MSE.index(min(Test_MSE)) + 1
print('Minimum test error occurs at degree', Test_min_degree )

In [None]:
Train_min_degree = Train_MSE.index(min(Train_MSE)) + 1
print('Minimum training error occurs at degree', Train_min_degree )

- I have calculated the MSE and accuracy scores of the model while using equation upto degree 40. We could find that the model has least testing error at degree 9 and least training error at degree 27.
- The testing error rises after degree 9 as the model begins to overfit and it cannot predict the test values correctly.

- PLotting the model using an equation of degree 9.

In [None]:
x_train = X_train.reshape(-1,1)
transf = preprocessing.PolynomialFeatures(degree = 9)
x_train = transf.fit_transform(x_train)
clf = LinearRegression()
clf.fit(x_train, Y_train)
train_accuracy = clf.score(x_train, Y_train)
print('Train accuracy:', train_accuracy)

In [None]:
x_test = X_test.reshape(-1,1)
transf = preprocessing.PolynomialFeatures(degree = 9)
x_test = transf.fit_transform(x_test)
test_accuracy = clf.score(x_test, Y_test)
print('test accuracy', test_accuracy)

In [None]:
train_predict = clf.predict(x_train)
train_MSE = mean_squared_error(Y_train, train_predict)
print('Training MSE:', train_MSE)

In [None]:
test_predict = clf.predict(x_test)
test_MSE = mean_squared_error(Y_test, test_predict)
print('Test MSE:', test_MSE)

In [None]:
x_model = x.reshape(-1,1)
x_model = transf.fit_transform(x_model)
y_model = clf.predict(x_model)
x_test = X_test

In [None]:
fig = plt.figure(figsize = (20,10))
ax1 = fig.add_subplot(1, 2, 1)
ax2 = fig.add_subplot(1, 2, 2)
ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax1.plot(x, y_model, label = 'Model function', linewidth = 3 )
ax1.legend()
ax1.set_xlabel('x')
ax1.set_ylabel('y')
ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
plt.legend()
ax2.set_xlabel('Degrees')
ax2.set_ylabel('MSE')

## Regularization to prevent over-fitting

- The two types of regularization are
1. L1 regularization or LASSO regression
2. L2 regularization or Ridge regression

## Ridge regression

- We will fit a Ridge regression moel on the above data with various lambda values and see the effect of changing lambda on the model.

In [None]:
def ridge_reg(lamda):
    x_train = X_train.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = 20)
    x_train = transf.fit_transform(x_train)
    clf = Ridge(alpha = lamda)
    clf.fit(x_train, Y_train)
    train_accuracy = clf.score(x_train, Y_train)
    intercept = clf.intercept_
    coefficient = clf.coef_
    parameters = coefficient + intercept
    x_test = X_test.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = 20)
    x_test = transf.fit_transform(x_test)
    test_accuracy = clf.score(x_test, Y_test)
    train_predict = clf.predict(x_train)
    train_MSE = mean_squared_error(Y_train, train_predict)
    test_predict = clf.predict(x_test)
    test_MSE = mean_squared_error(Y_test, test_predict)
    print('Train accuracy:', train_accuracy, '\n')
    print('Test accuracy:', test_accuracy, '\n')
    print('Train MSE', train_MSE, '\n')
    print('Test MSE', test_MSE, '\n')
    print('Parameters:', parameters)
    x_model = x.reshape(-1,1)
    x_model = transf.fit_transform(x_model)
    y_model = clf.predict(x_model)
    x_test = X_test
    fig = plt.figure(figsize = (20,10))
    ax1 = fig.add_subplot(1, 2, 1)
    ax2 = fig.add_subplot(1, 2, 2)
    ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
    ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
    ax1.plot(x, y_model, label = 'Model function', linewidth = 3 )
    ax1.legend()
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
    ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
    ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
    plt.legend()
    ax2.set_xlabel('x')
    ax2.set_ylabel('y')

In [None]:
ridge_reg(0)

In [None]:
ridge_reg(0.5)    

In [None]:
ridge_reg(1)

In [None]:
ridge_reg(10)

In [None]:
ridge_reg(100)

In [None]:
ridge_reg(1000)

In [None]:
ridge_reg(10000)

In [None]:
ridge_reg(100000)

In [None]:
ridge_reg(1000000)

In [None]:
ridge_reg(100000000)

- We can see that as the value of lambda increases the model becomes a straight line parallel to x-axis.

## Lasso Regression

- We will fit a LASSO regression model on above data and see the effect of change in lambda on the model.

In [None]:
def lasso_reg(lamda):
    x_train = X_train.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = 20)
    x_train = transf.fit_transform(x_train)
    clf = Lasso(alpha = lamda)
    clf.fit(x_train, Y_train)
    intercept = clf.intercept_
    coefficient = clf.coef_
    parameters = coefficient + intercept
    train_accuracy = clf.score(x_train, Y_train)
    x_test = X_test.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = 20)
    x_test = transf.fit_transform(x_test)
    test_accuracy = clf.score(x_test, Y_test)
    train_predict = clf.predict(x_train)
    train_MSE = mean_squared_error(Y_train, train_predict)
    test_predict = clf.predict(x_test)
    test_MSE = mean_squared_error(Y_test, test_predict)
    print('Train accuracy:', train_accuracy, '\n')
    print('Test accuracy:', test_accuracy, '\n')
    print('Train MSE', train_MSE, '\n')
    print('Test MSE', test_MSE, '\n')
    print('Parameters:', parameters)
    x_model = x.reshape(-1,1)
    x_model = transf.fit_transform(x_model)
    y_model = clf.predict(x_model)
    x_test = X_test
    fig = plt.figure(figsize = (20,10))
    ax1 = fig.add_subplot(1, 2, 1)
    ax2 = fig.add_subplot(1, 2, 2)
    ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
    ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
    ax1.plot(x, y_model, label = 'Model function', linewidth = 3 )
    ax1.legend()
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
    ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
    ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
    plt.legend()
    ax2.set_xlabel('x')
    ax2.set_ylabel('y')

In [None]:
lasso_reg(0.1)

In [None]:
lasso_reg(0.01)

In [None]:
lasso_reg(0.001)

In [None]:
lasso_reg(0.0005)

In [None]:
lasso_reg(1)

In [None]:
lasso_reg(10)

In [None]:
lasso_reg(100)

## Ridge regression with Cross-Validation

- Here I am using the RidgeCV regressor from sklearn.

In [None]:
def ridge_reg_cv(lamda):
    x_train = X_train.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = 20)
    x_train = transf.fit_transform(x_train)
    clf = RidgeCV(alphas = lamda, cv = 5)
    clf.fit(x_train, Y_train)
    best_alpha = clf.alpha_
    cv_score = clf.best_score_
    train_accuracy = clf.score(x_train, Y_train)
    intercept = clf.intercept_
    coefficient = clf.coef_
    parameters = coefficient + intercept
    x_test = X_test.reshape(-1,1)
    transf = preprocessing.PolynomialFeatures(degree = 20)
    x_test = transf.fit_transform(x_test)
    test_accuracy = clf.score(x_test, Y_test)
    train_predict = clf.predict(x_train)
    train_MSE = mean_squared_error(Y_train, train_predict)
    test_predict = clf.predict(x_test)
    test_MSE = mean_squared_error(Y_test, test_predict)
    print('Best lambda:', best_alpha)
    print('CV score:', cv_score)
    print('Train accuracy:', train_accuracy, '\n')
    print('Test accuracy:', test_accuracy, '\n')
    print('Train MSE', train_MSE, '\n')
    print('Test MSE', test_MSE, '\n')
    print('Parameters:', parameters)
    x_model = x.reshape(-1,1)
    x_model = transf.fit_transform(x_model)
    y_model = clf.predict(x_model)
    x_test = X_test
    fig = plt.figure(figsize = (20,10))
    ax1 = fig.add_subplot(1, 2, 1)
    ax2 = fig.add_subplot(1, 2, 2)
    ax1.scatter(X_train, Y_train, color = 'k', label = 'Training examples')
    ax1.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
    ax1.plot(x, y_model, label = 'Model function', linewidth = 3 )
    ax1.legend()
    ax1.set_xlabel('x')
    ax1.set_ylabel('y')
    ax2.scatter(X_test, Y_test, color = 'r', label = 'Testing examples')
    ax2.plot(x, np.cos(1.2 * x * np.pi), linewidth = 3, label = 'True function')
    ax2.scatter(X_test, test_predict,color = 'k', label = 'Model predictions')
    plt.legend()
    ax2.set_xlabel('x')
    ax2.set_ylabel('y')

- I am inputting certain values of lambda to find the best among them using CV.

In [None]:
ridge_reg_cv(np.array([0.0005, 0.001, 0.01, 0.1, 1, 10, 100]))

- Further precision can be obtained in the values of lambda by inputting values around 0.0005.

In [None]:
ridge_reg_cv(np.array([0.0001, 0.0002, 0.0003, 0.0004, 0.0005]))