# Regularization
In this section, we will introduce two ways to regularize regression: **Ridge regression** and **Lasso regression**. 

Consider the following regression problem:
$$
\begin{aligned}
\min_{\bm{\omega}} \quad &  \|{y} - \bm{X}\bm{\omega}\|_{2}^{2} \\
\end{aligned}
$$
where $X \in \mathbb{R}^{m \times d}$ and $y \in \mathbb{R}^m$ are the data matrix and the response vector, respectively.
Now we will restrict the norm of $\bm{\omega}$, so that the problem becomes constrained optimization:
$$
\begin{aligned}
\min_{\bm{\omega}} \quad &  \|{y} - \bm{X}\bm{\omega}\|_{2}^{2} \quad s.t.\| \bm{\omega}\|_{2}^{2} \leq C  \\
\end{aligned}
$$
where $C$ is a positive constant. The above problem can be solved by Lagrange multiplier method. The Lagrangian function is:
$$
\begin{aligned}
L(\bm{\omega}, \lambda) = \|{y} - \bm{X}\bm{\omega}\|_{2}^{2} + \lambda (\| \bm{\omega}\|_{2}^{2} - C)
\end{aligned}
$$
where $\lambda$ is the Lagrange multiplier depending on $C$. The L2-norm of $\bm{\omega}$ refers to **Ridge regression** while the L1-norm refers to **Lasso regression**.

For Ridge regression, we can derive the solution by take direct derivative of $L(\bm{\omega}, \lambda)$ with respect to $\bm{\omega}$ and set it to zero:   
$$
\begin{aligned}
\bm{\omega} = (\bm{X}^{\top}\bm{X} + \lambda \bm{I})^{-1}\bm{X}^{\top}y
\end{aligned}
$$
More information of Ridge regression can be found in:
- [Ridge_regression - Wikipedia ](https://en.wikipedia.org/wiki/Ridge_regression).


For Lasso regression, we cannot directly take derivative since $\| \bm{\omega}\|_{1}^{2}$ is not differentiable. Instead, we can use coordinate descent method, LARS method, ISTA(Iterative Shrinkage-Thresholding Algorithm) based on proximal gredient method and FISTA to get the solution．<br>
More information of Ridge regression can be found in:
- [LASSO - Wikipedia ](https://en.wikipedia.org/wiki/Lasso_(statistics)).
