<a href="https://colab.research.google.com/gist/jonghank/e0032bfb083e451628e989a070b0c375/least_squares.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Least squares, least norm, and constrained least squares problems

$$
\newcommand{\eg}{{\it e.g.}}
\newcommand{\ie}{{\it i.e.}}
\newcommand{\argmin}{\operatornamewithlimits{argmin}}
\newcommand{\mc}{\mathcal}
\newcommand{\mb}{\mathbb}
\newcommand{\mf}{\mathbf}
\newcommand{\minimize}{{\text{minimize}}}
\newcommand{\diag}{{\text{diag}}}
\newcommand{\cond}{{\text{cond}}}
\newcommand{\rank}{{\text{rank }}}
\newcommand{\range}{{\mathcal{R}}}
\newcommand{\null}{{\mathcal{N}}}
\newcommand{\tr}{{\text{trace}}}
\newcommand{\dom}{{\text{dom}}}
\newcommand{\dist}{{\text{dist}}}
\newcommand{\R}{\mathbf{R}}
\newcommand{\SM}{\mathbf{S}}
\newcommand{\ball}{\mathcal{B}}
\newcommand{\bmat}[1]{\begin{bmatrix}#1\end{bmatrix}}
$$

__<div style="text-align: right"> ASE7030: Convex Optimization, Inha University. </div>__
_<div style="text-align: right"> Jong-Han Kim (jonghank@inha.ac.kr) </div>_


<br>

## Least squares


<br>

### Least squares problems

$$
\begin{aligned}
  \underset{x}{\minimize} \quad& \| Ax-b \|_2^2
\end{aligned}
$$

with $A$ being tall and full rank (columns of $A$ linearly independent). The optimal solution is given by

\begin{align*}
x^* &= A^\dagger b \\
&= (A^TA)^{-1}A^Tb
\end{align*}

Questions:
- Why $A^\dagger b$?
- What if $A$ is rank deficient (columns of $A$ linearly dependent)?


<br>

### Regularized least squares problems

$$
\begin{aligned}
  \underset{x}{\minimize} \quad& \| Ax-b \|_2^2 + \lambda \|x\|_2^2
\end{aligned}
$$

with some _hyper-parameter_ $\lambda>0$. This is equivalent to

$$
\begin{aligned}
  \underset{x}{\minimize} \quad& \left\| \bmat{A \\ \sqrt{\lambda}I}x-\bmat{b\\0} \right\|_2^2
\end{aligned}
$$

Hence the optimal solution is given by: 
$$
x^* = \left( A^TA+\lambda I\right)^{-1}A^Tb
$$

Note that the above always holds for any $A$ (full rank assumption not required).

Questions:
- What happens if $\lambda\rightarrow \infty$?
- What happens if $\lambda\rightarrow 0$ when $A$ is tall full rank?
- What happens if $\lambda\rightarrow 0$ when $A$ is wide full rank?




<br> 

### Least norm problems

$$
\begin{aligned}
  \underset{x}{\minimize} \quad& \|x \|_2^2 \\
  \text{subject to} \quad& Ax=b
\end{aligned}
$$

with $A$ being wide and full rank (rows of $A$ linearly independent). The optimal solution is given by

\begin{align*}
  x^* &= A^\dagger b \\
  &= A^T(AA^T)^{-1}b
\end{align*}

Questions:
- Why $A^\dagger b$?
- What if $A$ is rank deficient (rows of $A$ linearly dependent)?




<br>

### Constrained least squares problems

$$
\begin{aligned}
  \underset{x}{\minimize} \quad& \| Ax-b \|_2^2\\   \text{subject to} \quad& Cx=d
\end{aligned}
$$

with $A$ being full column rank and $C$ being full row rank (actually these can be relaxed).
The optimal solution can be obtained by solving

$$
\bmat{A^TA & C^T \\ C & 0 }\bmat{x\\ \nu} = \bmat{A^Tb \\ d}
$$

where the first equation
$$
  A^TA x + C^T \nu = A^T b
$$
states that the $\null(C)$ is orthogonal to the gradient of $\| Ax-b \|_2^2$ at the optimum, and the second equation
$$
  Cx = d
$$
states that the optimal point is feasible.

The optimal solution can also be obtained via solving:

$$
\bmat{0 & A^T & C^T \\ A & -I & 0 \\ C & 0 & 0} \bmat{x \\ y \\ \nu} = \bmat{0 \\ b \\ d}
$$

which is helpful especially when $A$ and $C$ are sparse.
