# LASSO Regression

`LASSO` - Least Absolute Shrinkage and Selection Operator

A regression + regularization technique. Similar to [ridge regression](../regression/ridge_regression.ipynb), it works by penalizing the magnitude of coefficients of features, along with minimizing the error between the predicted and actual observations. It's mainly used on datasets with <em>many</em> features (i.e. in the millions) since it provides both model selection through a sparse solution. 

It performs `L1 regularization` — adds a penalty equivalent to the absolute values of the magnitude. 

Assuming the observations are centered, `LASSO` estimates are computed by:<br>

$$\hat{\beta} = argmin \|y-X\beta\|_2^2$$
$$\|\beta\|_1 \leq K$$

which is equvalent to:
$$\hat{\beta} = argmin \|y-X\beta\|_2^2 + \lambda\|\beta\|_1$$


* shrinkage: estimated lasso coefficients are shrunken towards `0`
* sparsity: some coefficients are exactly `0`
* $\lambda \ge 0$ controls sparsity

#### Resources:
* [A Complete Tutorial on Ridge and Lasso Regression in Python](https://chandlerfang.blog/2016/09/30/a-complete-tutorial-on-ridge-and-lasso-regression-in-python/)
* [ISyE8803: Topics on High-Dimensional Data Analytics — Module 4](https://courses.edx.org/courses/course-v1:GTx+ISYE8803+2T2019/course/#block-v1:GTx+ISYE8803+2T2019+type@chapter+block@223667a47f58432ea40f272c8ed71e11)