#**Lasso Regression**
Lasso Regression is an extension of Ridge regression, but has  one major benifit, and this is due to its abilty to directly help with feature reduction. "The key difference between these techniques is that Lasso shrinks the less important feature’s coefficient to zero thus, removing some feature altogether. So, this works well for feature selection in case we have a huge number of features", Toward Science)

In ridge regression we had the following formula:

>>$min_{\beta}(y-X\beta)^T(y-X\beta) +\lambda(\beta^T\beta-c)$ which gives

>>$\hat{B}_R={(X^TX+{\lambda}I)}^{-1}X^Ty$

Now for Lasso Regression the formula changes slightly, we replace the $\lambda(\beta^T\beta-c)$ with $\lambda(\lvert\beta\rvert-c)$

The impact of this change is that some of the parmeter estimates $\beta$ will be set to zero, thus helping us with variable selection. There is a really nice explaination [here](https://stats.stackexchange.com/questions/74542/why-does-the-lasso-provide-variable-selection). In this explaination the authors explain that when the cost function is differentiated the part that has the absolute value drives the optimisation routine to select certain parameters as zero.

#**Elastic Net**
Elastic Net is an ensemble of both the L1 and L2 regukraization techniques.
Generally, Lasso will eliminate many features, and reduce overfitting in your linear model. Ridge will reduce the impact of features that are not important in predicting your y values. Elastic Net combines feature elimination from Lasso and feature coefficient reduction from the Ridge model to improve your model's predictions.  We have completed a small example of it at the end of these notes.

</br>

Lets look at the data we had in the ridge regression step and see how it performs under lasso regression,



In [1]:
### Generator for artifical Dataset.

import numpy as np
n_samples, n_features = 10, 5
rng = np.random.RandomState(0)
y = rng.randn(n_samples)
X = rng.randn(n_samples, n_features)
X[:,4]=2.5*X[:,2]+2.2*X[:,3]+(X[:,4]/100)

Retrieve the data and set up the Standard Error (SE) function

In [2]:
import numpy as np
n_samples, n_features = 10, 5
X=np.array([[ 0.14404357 , 1.45427351,  0.76103773,  0.12167502,  2.17471798],
   [ 0.33367433,  1.49407907, -0.20515826,  0.3130677,   0.16731233],
   [-2.55298982,  0.6536186,   0.8644362,  -0.74216502,  0.551025  ],
   [-1.45436567,  0.04575852, -0.18718385,  1.53277921,  2.91884823],
   [ 0.15494743,  0.37816252, -0.88778575, -1.98079647, -6.58069572],
   [ 0.15634897,  1.23029068,  1.20237985, -0.38732682,  2.1508076 ],
   [-1.04855297, -1.42001794, -1.70627019,  1.9507754,   0.02093387],
   [-0.4380743,  -1.25279536,  0.77749036, -1.61389785, -1.60897678],
   [-0.89546656,  0.3869025,  -0.51080514, -1.18063218, -3.87468547],
   [ 0.42833187,  0.06651722,  0.3024719,  -0.63432209,-0.64295627]])

y=[ 1.76405235,0.40015721,  0.97873798,  2.2408932,   1.86755799, -0.97727788,  0.95008842, -0.15135721, -0.10321885,  0.4105985 ]
#X2=X[0:10,0:4]
#print(X2)
def se(X,mse):

  SE=np.zeros(len(X[0,:]))
  for i in range(0,len(X[0,:])):
     SE[i]=np.sqrt(mse/np.square(X[:,i]-np.mean(X[:,i])).sum())

  return SE

We will now implement Lasso regression. You should vary the $\alpha$ value and see how your paramters change.

In [3]:
from sklearn.linear_model import Lasso
from sklearn.linear_model import LinearRegression
from statsmodels.stats.outliers_influence import variance_inflation_factor
import pandas as pd


vif = pd.DataFrame()
vif["VIF Factor"] = [variance_inflation_factor(X, i) for i in range(X.shape[1])]

print(vif)

print(np.corrcoef(X.transpose()))
lasso = Lasso(alpha=0.01,max_iter=10000)
lasso.fit(X,y)

mse=np.square(y-lasso.predict(X)).sum()/(n_samples-n_features)
print('Standard errors are ',se(X,mse))
print('r_squared:',lasso.score(X, y))
print('reg coefficents : ',lasso.coef_)

      VIF Factor
0       3.553885
1       1.265923
2  218847.010121
3  344541.755303
4  369820.450039
[[ 1.          0.28495567  0.050991   -0.22445704 -0.17770115]
 [ 0.28495567  1.          0.43815271 -0.12449802  0.22222447]
 [ 0.050991    0.43815271  1.         -0.35107814  0.44352535]
 [-0.22445704 -0.12449802 -0.35107814  1.          0.68349441]
 [-0.17770115  0.22222447  0.44352535  0.68349441  1.        ]]
Standard errors are  [0.40738126 0.38854141 0.43184097 0.30921653 0.13441583]
r_squared: 0.25336565588743165
reg coefficents :  [-0.25560658  0.24991356 -0.44210645  0.11113209 -0.        ]


The next peice of code is a really simple implementation of Elastic Net. We have balanced the ratio between Ridge and Lasso Regularization at 50:50 (l1_ratio -0.5).

In [5]:
from sklearn.linear_model import ElasticNet
regr = ElasticNet(random_state=0,alpha=0.01,l1_ratio=0.5,fit_intercept=True,max_iter=100000)
regr.fit(X, y)

print('r_squared :',regr.score(X, y))
print('reg coefficents : ',regr.coef_)


r_squared : 0.25377415238976153
reg coefficents :  [-0.26193562  0.25858362 -0.44915084  0.11218117 -0.        ]


#**Review**

We can see that is is quite easy to run either the ridge regression or the lasso regression. It is probably best to start your feature selection process with the Lasso regression rather than ridge regression as it has definitive mechanism for dropping a variable. Both require you to choose a value for $\alpha$, which is a disadvantage. However, using the Elastic Net approach could be a good compromise as ridge regression is better at handling over specified problems.

In this step you will see that a number of the parameters have a very wide SE. The Lasso analysis above shows us that 1 variable is set to zero but the rest are not significantly different from zero if you examine the confidence intervals ($\beta \pm 1.96.S.E$)

Can you explain this? It is a simple answer. Talk about this on the comments board and see if you can come to a consensus.

I would also suggest you take the 4 pieces of code from the last 2 steps and create training and test sets. See how your predictions work out with bigger datasets. This [link](https://towardsdatascience.com/ridge-and-lasso-regression-a-complete-guide-with-python-scikit-learn-e20e34bcbf0b) may help you to generate your code.


