#### Lasso Regression :
    
Lasso regression stands for Least Absolute Shrinkage and Selection Operator.
It adds penalty term to the cost function. This term is the absolute sum of the coefficients. As the value of coefficients increases from 0 this term penalizes, cause model, to decrease the value of coefficients in order to reduce loss. The difference between ridge and lasso regression is that it tends to make coefficients to absolute zero as compared to Ridge which never sets the value of coefficient to absolute zero.


 L_{lasso} = argmin_{\hat{\beta}}\left ({\left \| Y- \beta * X \right \|}^{2} + \lambda * {\left \| \beta  \right \|}_{1}  \right ) 


Limitation of Lasso Regression:

Lasso sometimes struggles with some types of data. 

If the number of predictors (p) is greater than the number of observations (n), Lasso will pick at most n predictors as non-zero, even if all predictors are relevant (or may be used in the test set).

If there are two or more highly collinear variables then LASSO regression select one of them randomly which is not good for the interpretation of data



L1 Normalization

It may be defined as the normalization technique that modifies the dataset values in a way that in each row the sum of the absolute values will always be up to 1. It is also called Least Absolute Deviations.

1.Is also known as least absolute deviations (LAD), least absolute errors (LAE)

2.It is basically minimizing the sum of the absolute differences (S) between the target value (Yi) and the estimated values

3.On another words Sum of absolute values = 1
    Example if applied this norm along row then sum of square for a row = 1. 
    
4.It is insensitive to outliers

5.Sparsity:
    Refers to that only very few entries in a matrix (or vector) is non-zero.
    L1-norm has the property of producing many coefficients with zero values or very small values with few large coefficients. 

L2 Normalization
1.Is also known as least squares

2.Sum of squares = 1
    Example if applied this norm along row then sum of square for a row = 1. 
    
3.takes outliers in consideration during training: 
    it is resistant to outliers in the data.
    
4.Computational efficiency:
    L1-norm does not have an analytical solution, but L2-norm does.
    This allows the L2-norm solutions to be calculated computationally efficiently.
    However, L1-norm solutions does have the sparsity properties which allows it to be used along with sparse algorithms,### L2 Normalization


#### Elastic Net :

Sometimes, the lasso regression can cause a small bias in the model where the prediction is too dependent upon a particular variable. 
In these cases, elastic Net is proved to better it combines the regularization of both lasso and Ridge. 
The advantage of that it does not easily eliminate the high collinearity coefficient.

In elastic Net Regularization we added the both terms of L1 and L2 to get the final loss function.

This leads us to reduce the following loss function: 


In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns
from sklearn.linear_model import LinearRegression

from sklearn.linear_model import Ridge

from sklearn.linear_model import Lasso

from sklearn.linear_model import ElasticNet

In [5]:
df=pd.read_csv("C:\\Users\\Pratik1\\Desktop\\dataset\\data's\\housing.csv")

In [6]:
df.head()

Unnamed: 0,RM,LSTAT,PTRATIO,MEDV
0,6.575,4.98,15.3,504000
1,6.421,9.14,17.8,453600
2,7.185,4.03,17.8,728700
3,6.998,2.94,18.7,701400
4,7.147,5.33,18.7,760200


In [7]:
X = df.drop('MEDV', axis=1)

y = df['MEDV']

from sklearn import preprocessing

X = preprocessing.scale(X)

y = preprocessing.scale(y)

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.30, random_state=1)

In [8]:
print(X_train.shape)

print(y_train.shape)

print(X_test.shape)

print(y_test.shape)

(342, 3)
(342,)
(147, 3)
(147,)


In [9]:
regression_model = LinearRegression()

regression_model.fit(X_train, y_train)

LinearRegression()

The optimization objective for Ridge is:

||y - Xw||^2_2 + alpha * ||w||^2_2

In [10]:
ridge = Ridge(alpha=.3)

ridge.fit(X_train,y_train)

print ("Ridge model:", (ridge.coef_))

Ridge model: [ 0.31037007 -0.46008909 -0.25831074]


In [11]:
lasso = Lasso(alpha=0.1)

lasso.fit(X_train,y_train)

print ("Lasso model:", (lasso.coef_))

Lasso model: [ 0.26550839 -0.42151352 -0.18794559]


The optimization objective for Lasso is:
    
(1 / (2 * n_samples)) * ||y - Xw||^2_2 + alpha * ||w||_1

In [12]:
print("Linear Regression Model Training Score: ", regression_model.score(X_train, y_train))

print("Linear Regression Model Testing Score: ",regression_model.score(X_test, y_test))

print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))

print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))

print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))
print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Linear Regression Model Training Score:  0.7105567540949002
Linear Regression Model Testing Score:  0.7281579138457909
Ridge Regression Model Training Score:  0.7105566037528164
Ridge Regression Model Testing Score:  0.7281166128672447
Lasso Regression Model Training Score:  0.6958190280767107
Lasso Regression Model Testing Score:  0.7011222584065626


In [13]:
print("Linear Regression Model Coefficient :",regression_model.coef_)
print("Ridge Regression Model Coefficient :",ridge.coef_)
print("Lasso Regression Model Coefficient :",lasso.coef_)

Linear Regression Model Coefficient : [ 0.31038811 -0.46041158 -0.25840905]
Ridge Regression Model Coefficient : [ 0.31037007 -0.46008909 -0.25831074]
Lasso Regression Model Coefficient : [ 0.26550839 -0.42151352 -0.18794559]


In [14]:
from sklearn.preprocessing import PolynomialFeatures

poly = PolynomialFeatures(degree = 2, interaction_only=True)

X_poly = poly.fit_transform(X)

X_train, X_test, y_train, y_test = train_test_split(X_poly, y, test_size=0.30, random_state=1)

regression_model.fit(X_train, y_train)

print(regression_model.coef_[0])

0.0


In [15]:
ridge = Ridge(alpha=.3)

ridge.fit(X_train,y_train)

print ("Ridge model:", (ridge.coef_))

Ridge model: [ 0.          0.27107574 -0.57159397 -0.17077965 -0.20051563 -0.18439328
 -0.14349348]


In [16]:
lasso = Lasso(alpha=0.003)

lasso.fit(X_train,y_train)

print ("Lasso model:", (lasso.coef_))

Lasso model: [ 0.          0.26858583 -0.571742   -0.16961397 -0.19866367 -0.17956566
 -0.13553868]


In [17]:
print("Linear Regression Model Training Score: ", regression_model.score(X_train, y_train))

print("Linear Regression Model Testing Score: ",regression_model.score(X_test, y_test))

print("Ridge Regression Model Training Score: ",ridge.score(X_train, y_train))

print("Ridge Regression Model Testing Score: ",ridge.score(X_test, y_test))

print("Lasso Regression Model Training Score: ",lasso.score(X_train, y_train))

print("Lasso Regression Model Testing Score: ",lasso.score(X_test, y_test))

Linear Regression Model Training Score:  0.8244073780079281
Linear Regression Model Testing Score:  0.8149670026064411
Ridge Regression Model Training Score:  0.8244070096412397
Ridge Regression Model Testing Score:  0.8149895819342466
Lasso Regression Model Training Score:  0.8243524495465678
Lasso Regression Model Testing Score:  0.8150507962199911


In [18]:
print("Linear Regression Model Coefficient :",regression_model.coef_)
print("Ridge Regression Model Coefficient :",ridge.coef_)
print("Lasso Regression Model Coefficient :",lasso.coef_)

Linear Regression Model Coefficient : [ 0.          0.27097105 -0.57225914 -0.17066546 -0.20080135 -0.1844696
 -0.14374296]
Ridge Regression Model Coefficient : [ 0.          0.27107574 -0.57159397 -0.17077965 -0.20051563 -0.18439328
 -0.14349348]
Lasso Regression Model Coefficient : [ 0.          0.26858583 -0.571742   -0.16961397 -0.19866367 -0.17956566
 -0.13553868]


In [19]:
from sklearn.linear_model import ElasticNet

ENreg = ElasticNet(alpha=0.5, l1_ratio=0.2, normalize=False)

ENreg.fit(X_train,y_train)

pred_cv = ENreg.predict(X_train)


ENreg.score(X_train,y_train)

0.7306970161020188