## Lasso Regression 

Least Absolute Shrinkage and Selection Operator Regression

Lasso Regression is just like ridge regression but It uses $l_{1}$ norm instead of $l_{2}$ 

### Difference Between Lasso and Ridge Regression 

- Ridge Regression tends to make less important features close to zeros. 
- Lasso Regression tends to eliminate or i can say make less important features exactly zero.

Ridge Regression Equation looks like:-  
    J($\Theta$) = $\frac{1}{m} +\sum \limits _{i = 1} ^{n} (\Theta^{T}x^{i} - y^{i})^{2} + \alpha\frac{1}{2}(|| w ||_{2})^{2}$ 

===============================================================================================

Lasso Regression Equation look like:- 
    J($\Theta$) = $\frac{1}{m} +\sum \limits _{i = 1} ^{n} (\Theta^{T}x^{i} - y^{i})^{2} + \alpha\sum \limits _{i = 1} ^{n} |\Theta_{i}| $ 

===============================================================================================

In [1]:
### Implementation of Lasso regression from scratch ### 
import numpy as np
class LassoRegression() :
      
    def __init__( self, learning_rate, iterations, l1_penality ) :
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.l1_penality = l1_penality
              
    def fit( self, X, Y ) :
                    
        self.m, self.n = X.shape          
        self.W = np.zeros( self.n )
        self.b = 0
        self.X = X
        self.Y = Y
                  
        for i in range( self.iterations ) :
            self.update_weights() 
        return self

    def update_weights( self ) :
             
        Y_pred = self.predict( self.X )  
        dW = np.zeros( self.n )
        for j in range( self.n ) :
            if self.W[j] > 0 :
                dW[j] = ( - ( 2 * ( self.X[:, j] ).dot( self.Y - Y_pred ) ) + self.l1_penality ) / self.m
            else :
                  
                dW[j] = ( - ( 2 * ( self.X[:, j] ).dot( self.Y - Y_pred ) ) - self.l1_penality ) / self.m
       
        db = - 2 * np.sum( self.Y - Y_pred ) / self.m 
        self.W = self.W - self.learning_rate * dW
        self.b = self.b - self.learning_rate * db
        return self
      
    def predict( self, X ) :
        return X.dot( self.W ) + self.b

In [2]:
### Implemenation of Lasso Regression in Sklearn ### Recommended 
m = 100
X = 6 * np.random.rand(m, 1) - 3
y = 0.5 * X**2 + X + 2 + np.random.randn(m, 1) 


from sklearn.linear_model import Lasso
lasso = Lasso(alpha=0.1) ### 
lasso.fit(X, y)
lasso.predict([[1.5]])

array([4.69259227])

## Elastic Net

Elastic Net is a mixture of both ridge and lasso regression, it has a hyperparameter **r** which is a mix ratio, if **r = 0** then it is equivalent to **Ridge Regression** and if **r = 1** then it is equivalent to **Lasso Regression**. So, we have to tune this hyperparameter **r**. 

===============================================================================================

J($\Theta$) = MSE($\Theta$) + $r\alpha \sum \limits_{i = 1}^n |\Theta_{i}| + \frac {1 - r}{2} \alpha \sum \limits_{i = 1}^n \Theta ^ {2}_{i}$

===============================================================================================


In [3]:
### Implementing Elastic Net from Scratch ### 

import numpy as np
class ElasticRegression() :
    def __init__( self, learning_rate, iterations, l1_penality, l2_penality ) :
          
        self.learning_rate = learning_rate
        self.iterations = iterations
        self.l1_penality = l1_penality
        self.l2_penality = l2_penality
              
    def fit( self, X, Y ):
        self.m, self.n = X.shape
        self.W = np.zeros( self.n )
        self.b = 0
        self.X = X
        self.Y = Y
        for i in range( self.iterations ) :
            self.update_weights()
        return self
      
    def update_weights( self ) :
        Y_pred = self.predict( self.X )
        dW = np.zeros( self.n )
        for j in range( self.n ) :
            if self.W[j] > 0 :
                dW[j] = ( - ( 2 * ( self.X[:,j] ).dot( self.Y - Y_pred ) ) + self.l1_penality + 2 * self.l2_penality * self.W[j] ) / self.m
            else :
                dW[j] = ( - ( 2 * ( self.X[:,j] ).dot( self.Y - Y_pred ) ) - self.l1_penality + 2 * self.l2_penality * self.W[j] ) / self.m
       
        db = - 2 * np.sum( self.Y - Y_pred ) / self.m 
        self.W = self.W - self.learning_rate * dW
        self.b = self.b - self.learning_rate * db
        return self
          
    def predict( self, X ) :
        return X.dot( self.W ) + self.b

In [4]:
### Implementaion of Elastic Net in Sklearn ### Recommended 
from sklearn.linear_model import ElasticNet
elastic_net = ElasticNet(alpha=0.1, l1_ratio=0.5)
elastic_net.fit(X, y)
elastic_net.predict([[1.5]])

array([4.69453394])