# Least Square With Optimality Certificate

The least square formulation is a very common optimization problem arising in many fields of engineering and science.
Its success comes from the fact that it has a nice Bayesian interpretation (TODO).

Assuming $A$ is a linear operator whose matrix is of size $n\times k$, the linear least square reads:

\begin{align*}
    \underset{x \in \mathbb{R}^k}{min} \quad \frac{1}{2} \|Ax-y\|_2^2
\end{align*}

With $y \in \mathbb{R}^k$.
From the Forward/Backward proximal spliting framework seen in the notebook called "ForwardBackwardDual", we introduced an algorithm that allows to compute a solution to both the primal and the dual problem. In order to apply this method on the simple linear, least square, we will derive proper convex conjugate in the proper settings.

## Proximal splitting framework
First, we will use the following formulation for the Forward-Backward proximal splitting method:

\begin{align*}
    \underset{x \in \mathbb{R}^k}{min} \quad f(x) + g(Ax)
\end{align*}

Where f and g should be convex functionals. In our case, we identify $f$ as a trivial function: $f(x)=0$, and g as $g(x) = \frac{1}{2} \|x-y\|_2^2$

In this case, the Fenchel-Rockafellar theorem shows that one can solve the following _dual problem_

\begin{align*}
    \underset{u \in \mathbb{R}^n}{min} f^*( -L^* u ) + g^*(u)
\end{align*}

and recover the unique solution $x^\star$ of the primal problem from a (non-necessarily unique) solution $u^\star$ of the dual problem, as

\begin{align*}
    x^\star = \nabla f^*( -L^* u^\star )
\end{align*}

## Chambolle Pock algorithm

In order to solve the problem exposed earlier, we will use the Chambolle-Pock strategy which reads:

Take an initial estimates $x^{0}$ and $u^{0}$ of the primal and dual solutions, a parameter $\tau>0$, a second parameter $\sigma>0$ such that $\sigma \tau \|A\|^2 < 1$, and a relaxation parameter $0<\rho<2$, and iterates, for $k=1,2,\ldots$:

\begin{align}
    u^{k} &= \mathrm{prox}_{\sigma g^*}( u^{k-1} + \sigma L(\tilde{x}^{k-1}) \\
    x^{k} &= \mathrm{prox}_{\tau f}(  x^{k-1}-\tau L^* u^{k} ) \\
    \tilde{x}^{k} &= x^{k} + \rho (x^{k}-x^{k-1})\\
\end{align}
  
Where, $x^{k}$ converges to a primal solution $x^\star$ and $u^{k}$ converges to a dual solution $u^\star$.

## Deriving the convex Conjugate

### Convex conjugate of $f$

We recall that we would like to instanciate the forward backward scheme for $f$ a trivial function: $f(x)=0$.
The convex conjugate of $f$ reads:

\begin{align*}
    f^*(u)= \underset{z}{max} \quad \langle u, z \rangle_{\mathbb{R}^n}
\end{align*}

This function has a non finite value ($+\infty$) for every non zero value of $u$. Such function reduces to the constraint $u=0$ that translate into the indicator function of the $\vec{0}$ vector : $\delta_0(u)$

### Convex conjugate of $g$

We recall that we would like to instanciate the forward backward scheme for $g$ as $g(x) = \|x-y\|_2^2$.
The convex conjugate of $g$ reads:

\begin{align*}
    g^*(u) &= \underset{z}{max} \quad \langle u, z \rangle_{\mathbb{R}^n} - \frac{1}{2} \|z-y\|_2^2 \\
\end{align*}

Where $c(z) = \langle u, z \rangle_{\mathbb{R}^n} - \frac{1}{2} \|z -y\|_2^2$ is a nice concave function that is differentiable, let's see where its derivative vanishes:
 
\begin{align}
    \frac{\partial c}{\partial z} &= 0\\
    \frac{\partial \langle u, z \rangle }{\partial z} - \frac{1}{2} \left(
    \frac{\partial \langle z, z \rangle }{\partial z} +
    \frac{\partial \langle y, y \rangle }{\partial z} - 2
    \frac{\partial \langle z, y \rangle }{\partial z} \right) &= 0\\
    u - z + y &= 0\\
    z = u + y
\end{align}
 
Now that we have found the optimum, we can express the convex conjugate $g^*(u)$:
 
 \begin{align}
  g^*(p) &= c(u + y)\\
  &= \langle u,u + y \rangle - \frac{1}{2}\| u + y - y \|_2^2\\
  &= \|u\|_2^2 + \langle u, y \rangle - \frac{1}{2}\|u\|^2 \\
  &= \frac{1}{2}\|u\|^2 + \langle u, y \rangle_{\mathbb{R}^n}
 \end{align}

## Deriving the proximity operator of $g^*$

The proximity operator of $g$ reads:

\begin{equation}
    prox_{\gamma g^*}(u) = \underset{z}{argmin} \quad \frac{1}{2\gamma} \|u-z\|_2^2 + \frac{1}{2} \|z\|^2 +
\langle z, y \rangle_{\mathbb{R}^n}
\end{equation}

Where $d(z) = \frac{1}{2\gamma} \|u-z\|_2^2 + \left( \frac{1}{2} \|z\|^2 + \langle z, y \rangle_{\mathbb{R}^n} \right)$ is a nice convex function that is differentiable, let's see where its derivative vanishes:

\begin{align}
    \frac{\partial d}{\partial z} &= 0\\
    \frac{1}{2\gamma} \left(
        \frac{\partial \langle u, u \rangle }{\partial x} +
        \frac{\partial \langle z, z \rangle }{\partial x} - 2
        \frac{\partial \langle u, z \rangle }{\partial x} \right) +
        \frac{1}{2}\frac{\partial \langle z, z \rangle }{\partial x} +
        \frac{\partial \langle z, y \rangle }{\partial x} &= 0\\
    \frac{z-u}{\gamma} + z + y &=0\\
    \left( \frac{\gamma+1}{\gamma} \right) z - \frac{1}{\gamma} u + y &= 0\\
    z &= \frac{u-\gamma y}{\gamma+1}
\end{align}
 
 Now, we have the following proximity operator:
 \begin{equation}
   prox_{\gamma g^*}(u) = \frac{u-\gamma y}{\gamma+1}
 \end{equation}

## Wrapping up

We are now able to give the dual problem of the original least square problem:
 
\begin{align}
    \underset{u \in \mathbb{R}^n}{max} \quad & -f^*(-A^*u) - g^*(u)\\
    \underset{u \in \mathbb{R}^n}{max} \quad & -\delta_{0}(-A^* u) -
      \frac{1}{2} \|u\|^2 - \langle u, y \rangle_{\mathbb{R}^n} \\
    \underset{u \in \mathbb{R}^n}{max} \quad & -\frac{1}{2} \|u\|^2 - \langle u, y \rangle_{\mathbb{R}^n} \quad \text{such that} \quad A^* u = 0 \\
\end{align}
 
A really interesting property for the meticulous data scientist, is that we can now actually measure the primal-dual gap for the current set of primal-dual solution:
 
\begin{equation}
  PD(x,u) = \|Ax-y\|_2^2 + \frac{1}{2} \|u\|^2 + \langle u, y \rangle_{\mathbb{R}^n}
\end{equation}
 
 A primal-dual gap numerically close to zero can be considered as an optimality certificate.

## Numerical experiment

We will now try to perform an unregularized deconvolution algorithm