# The Lasso Method

In [1]:
import numpy as np
import matplotlib.pyplot as plt

%matplotlib notebook

$\newcommand{\norm}[1]{\left\lVert#1\right\rVert}$

The lasso method is a specialized version of linear regression based off the ordinary least squares method. Recall that in this model, we seek to fit our feature vectors $\{x_i\}_{i=1}^N$ where $x_i = (x_{i1}, \ldots, x_{iM})^T  \in \mathbb{R}^M$ and their responses $\{y_i\}_{i=1}^N$ according to a linear parametrization of the form

$$y_i = \beta_0 + \sum_{j=1}^M \beta_jx_ij$$

We fit our model by choosing the parameter vector $\beta = (\beta_0, \beta_1, \ldots, \beta_M)^T \in \mathbb{R}^{M+1}$ so as to minimize the residual sum of squares error given as

$$RSS(\beta) = \sum_{i=1}^N (y_i - \beta_0 - \sum_{j=1}^M \beta_jx_{ij})^2$$

Collapsing $\beta_0$ into $\beta$, extending $x_i$ from a vector in $\mathbb{R}^{M}$ to one in $\mathbb{R}^{M+1}$, and introducing a normalization constant $\frac{1}{2N}$ we can rewrite the optimization problem as a minimization over $\beta$ of the following generalized $RSS$ function.

$$RSS(\beta) = \frac{1}{2N}\norm{y_i - x_i \dot \beta}_2^2$$

The lasso method simply imposes a constraint on this optimization, namely we require that $\norm{\beta}_1 \leq t$ for some $t \in \mathbb{R}$.