In This notebook, I will test linear models such:

    * Simple Linear Regression
    * Multiple Linear Regression
    * Ridge
    * Lasso
    * Elatic Net

[Sklearn Documentation](https://scikit-learn.org/stable/modules/linear_model.html#)

# Simple Linear Regression

In regression the target value is expected to be a linear combination of the features. In mathematical notation, if $\hat{y}$ is the predicted value.

$$
\hat{y}(w, x) = w_0 + w_1 x_1 + ... + w_p x_p
$$

Where, $w = (w_1,..., w_p)$ is the vecotr of coefficients and $w_0$ is the intercept

Linear Regression fits a linear model with coefficients $w = (w_1, ..., w_p)$ to minimize the residual sum of squares between the observed targets in the dataset, and the targets predicted by the linear approximation. Mathematically it solves a problem of the form:

$$
\min_{w} || X w - y||_2^2
$$


**In case of Simple Linear Regression we have one independent variable or feature**

This example uses the only the first feature of the diabetes dataset, in order to illustrate a two-dimensional plot of this regression technique. The straight line can be seen in the plot, showing how linear regression attempts to draw a straight line that will best minimize the residual sum of squares between the observed responses in the dataset, and the responses predicted by the linear approximation.

The coefficients, the residual sum of squares and the coefficient of determination are also calculated.

In [None]:
import numpy as np
import matplotlib.pyplot as plt

# Import linear model and dataset
from sklearn import datasets, linear_model

# Import performance metrics
from sklearn.metrics import mean_squared_error, r2_score

# Import test_train_split
from sklearn.model_selection import train_test_split

In [None]:
# Load the dataset

X, y = datasets.load_diabetes(return_X_y=True)

In [None]:
# As this is a Simple Linear Regression we only need one feature or independent variable

X = X[:, np.newaxis, 2]

In [None]:
# Split X and y into train and test, with test size 20%

train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)

In [None]:
# Create Simple Linear Regression Model

reg = linear_model.LinearRegression(fit_intercept=True)

In [None]:
# Train the model

result = reg.fit(train_x, train_y)

In [None]:
# Make prediction on the test set

y_pred = result.predict(test_x)

In [None]:
# Print the coefficients

print("Intercept is: {}, and coeffient estimate is {}".format(np.round(result.intercept_, 3), np.round(result.coef_,3)))
print()
print("Mean Squared Error is {}".format(np.round(mean_squared_error(test_y, y_pred),3)))
print()
print("R_Squared is {}".format(np.round(r2_score(test_y, y_pred),3)))

In [None]:
# Let make some plots


plt.scatter(test_x, test_y,  color='black')
plt.plot(test_x, y_pred, color='blue', linewidth=3)

plt.xticks(())
plt.yticks(())


plt.axhline(y=0, color='k')
plt.axvline(x=0, color='k')

plt.show()

# Multiple Linear Regression

I will use the same dataset with full set of features as this is the example of multiple linear regression

In [None]:
# Load the dataset

X, y = datasets.load_diabetes(return_X_y=True)

In [None]:
# Split X and y into train and test, with test size 20%

train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)

In [None]:
# Create Multiple Linear Regression Model

reg = linear_model.LinearRegression(fit_intercept=True)

In [None]:
# Train the model

result = reg.fit(train_x, train_y)

In [None]:
# Make prediction on the test set

y_pred = result.predict(test_x)

In [None]:
# Print the coefficients

print("Intercept is: {}".format(np.round(result.intercept_, 3)))
print()
print("Coeficient estimates are {}".format(np.round(result.coef_, 3)))
print()
print("Mean Squared Error is {}".format(np.round(mean_squared_error(test_y, y_pred),3)))
print()
print("R_Squared is {}".format(np.round(r2_score(test_y, y_pred),3)))

# Ridge

Ridge regression addresses some of the problems of Ordinary Least Squares by imposing a penalty on the size of the coefficients. The ridge coefficients minimize a penalized residual sum of squares:

$$
\min_{w} || X w - y||_2^2 + \alpha ||w||_2^2
$$

The complexity parameter $\alpha \geq 0$ controls the amount of shrinkage: the larger the value of $\alpha$,  the greater the amount of shrinkage and thus the coefficients become more robust to collinearity.

**I will use the same dataset, with full feature set**

In [None]:
# Load the dataset

X, y = datasets.load_diabetes(return_X_y=True)

In [None]:
# Split X and y into train and test, with test size 20%

train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)

In [None]:
# Create Ridge Regression Model. This is a Ridge regression with built-in cross-validation.
# I indicated 10-fold cross-validation

reg = linear_model.RidgeCV(alphas=[1e-3, 1e-2, 1e-1, 1, 10], cv=10, fit_intercept=True)

In [None]:
# Train the model

result = reg.fit(X, y)

In [None]:
# Print model output

print("Model intercept is {}".format(np.round(result.intercept_, 3)))
print()
print("Coefienct estimates are {}".format(np.round(result.coef_, 3)))
print()
print("Best alpha is {}".format(result.alpha_))
print()
print("R_sqaured is {}".format(np.round(result.score(test_x, test_y), 3)))

# Lasso

The Lasso is a linear model that estimates sparse coefficients. It is useful in some contexts due to its tendency to prefer solutions with fewer non-zero coefficients, effectively reducing the number of features upon which the given solution is dependent. For this reason Lasso and its variants are fundamental to the field of compressed sensing. Under certain conditions, it can recover the exact set of non-zero coefficients.

Mathematically, it consists of a linear model with an added regularization term. The objective function to minimize is:

$$
\min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha ||w||_1}
$$


The lasso estimate thus solves the minimization of the least-squares penalty with $\alpha ||w||_1$ added, where $\alpha$ is a constant and $||w||_1$ is the $\ell_1$-norm of the coefficient vector. The alpha parameter controls the degree of sparsity of the estimated coefficients.

**I will use the same dataset, with full feature set**

In [None]:
# Load the dataset

X, y = datasets.load_diabetes(return_X_y=True)

In [None]:
# Split X and y into train and test, with test size 20%

train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)

In [None]:
# Create Lasso Regression Model. This is a Lasso regression with built-in cross-validation.
# I indicated 10-fold cross-validation

reg = linear_model.LassoCV(alphas=[1e-3, 1e-2, 1e-1, 1, 10], cv=10, fit_intercept=True)

In [None]:
# Train the model

result = reg.fit(X, y)

In [None]:
# Print model output

print("Model intercept is {}".format(np.round(result.intercept_, 3)))
print()
print("Coefienct estimates are {}".format(np.round(result.coef_, 3)))
print()
print("Best alpha is {}".format(result.alpha_))
print()
print("R_sqaured is {}".format(np.round(result.score(test_x, test_y), 3)))

# Elatic-Net

ElasticNet is a linear regression model trained with both $\ell_1$ and $\ell_2$-norm regularization of the coefficients. This combination allows for learning a sparse model where few of the weights are non-zero like Lasso, while still maintaining the regularization properties of Ridge.We control the convex combination of $\ell_1$ and $\ell_2$ using the l1_ratio parameter.


Elastic-net is useful when there are multiple features which are correlated with one another. Lasso is likely to pick one of these at random, while elastic-net is likely to pick both.

A practical advantage of trading-off between Lasso and Ridge is that it allows Elastic-Net to inherit some of Ridge’s stability under rotation.


The objective function to minimize is in this case:

$$
\min_{w} { \frac{1}{2n_{\text{samples}}} ||X w - y||_2 ^ 2 + \alpha \rho ||w||_1 +
\frac{\alpha(1-\rho)}{2} ||w||_2 ^ 2}
$$

**I will use the same dataset, with full feature set**

In [None]:
# Load the dataset

X, y = datasets.load_diabetes(return_X_y=True)

In [None]:
# Split X and y into train and test, with test size 20%

train_x, test_x, train_y, test_y = train_test_split(X, y, test_size=0.2)

In [None]:
# Create Elatic-Net Regression Model. This is a Elatic-Net regression with built-in cross-validation.
# I indicated 10-fold cross-validation.
# The class ElasticNetCV can be used to set the parameters alpha and l1_ratio by cross-validation.

reg = linear_model.ElasticNetCV(l1_ratio=[1e-4,1e-3,1e-2,.1, .5, .7, .9, .95, .99, 1],
                                alphas=[1e-3, 1e-2, 1e-1, 1, 10], 
                                cv=10, fit_intercept=True)

In [None]:
# Train the model

result = reg.fit(X, y)

In [None]:
# Print model output

print("Model intercept is {}".format(np.round(result.intercept_, 3)))
print()
print("Coefienct estimates are {}".format(np.round(result.coef_, 3)))
print()
print("Best alpha is {}".format(result.alpha_))
print()
print("Best l1_ration is {}".format(result.l1_ratio_))
print()
print("R_sqaured is {}".format(np.round(result.score(test_x, test_y), 3)))