#math and linear algebra stuff
import numpy as np
import numpy.linalg as la

#plots
import matplotlib as mpl
mpl.rcParams['figure.figsize'] = (15.0, 15.0)
#mpl.rc('text', usetex = True)
import matplotlib.pyplot as plt
%matplotlib inline

# Solving least absolute deviation with positivity constraint using Chambolle Pock

## Introduction

This notebook has been designed after reading an article from Camille Sutour about noise estimation in images (Noise level function).
In this work, she tries to estimate the NLF (noise level function) with a 2 order polynomial, with non-negative coefficient, hence the positivity constraint.

In order to make the estimator more robust to outliers, she uses as least absolute deviation instead of the classical least square.

#### The usecase: Noise model estimation

After a specific routine has been used, we assume that pure noise patches have been found in an image.

One can assume locally stationary noise behaviour, can then compute for each patch its mean $\mu=\frac{1}{N} \sum_{i=0}^{N-1} x_i$ and the unbiased estimator of variance $\sigma^2=\frac{1}{N-1} \sum_{i=0}^{N-1} (x_i-\mu)^2$.

Given a list of pairs $\left( \hat{\mu_p}, \hat{\sigma_p}^2 \right)$, one is now interested in knowing the Noise Level Function (NLF), which is a $\mathbb{R} \rightarrow \mathbb{R}^+$ function, that gives the noise variance for a fiven level of image intensity.

The author assumes that the NLF is a positively increasing second order polynomial of the image intensity. Then one just performs a least square on the following problem:

\begin{align*}
  \underset{x\in\mathbb{R}^3}{\text{min}} \qquad ||Ax-b||_2^2
\end{align*}

Where we have:

* $A=\begin{pmatrix} \mu_0^2 & \mu_0 & 1\\
\mu_1^2 & \mu_1 & 1\\
\vdots & \vdots & \vdots\\
\mu_{N-1}^2 & \mu_{N-1} & 1\end{pmatrix}$

* $b=\begin{pmatrix}\sigma_0^2\\ \sigma_1^2 \\ \vdots \\ \sigma_{N-1}^2\end{pmatrix}$

The solution vector $\hat{x}$ is given by Moore-Penrose pseudo-inverse: $A^+ b$ where $A^+=(A^T A)^{-1} A^T$
However, the author prefers to use the least absolute deviation, with positivity constraint:

\begin{align*}
  \underset{x\in\mathbb{R}^{+3}}{\text{min}} \qquad ||Ax-b||_1
\end{align*}

## Studying the problem under the CP framework

### First overview of CP mapping
First of all, one can notice that the constrained optimization problem can be recasted as a non explicitly constrained optimization problem, just by adding a term featuring the convex indicator function of the non-negative orthant in $\mathbb{R}^3$:

\begin{align*}
  \delta_{\mathbb{R}^{3+}}(x) &= 
    \begin{cases}0 \qquad &\text{if } x \in \mathbb{R}^{3+} \\
    +\infty & \text{otherwise} \end{cases}
\end{align*}

Such that we can write the new optimization problem:

\begin{align*}
  \underset{x}{\text{min}} \quad ||Ax-b||_1 + \delta_{\mathbb{R}^{3+}}(x)
\end{align*}

We can now try to map this problem to the Chambolle pock pattern:

\begin{align*}
  \underset{x}{min} \quad f(x) + g(Lx)
\end{align*}
where $f$ and $g$ are convex functions, whose proximity operators can be computed, and $L$
is a linear operator.

and its dual problem:

\begin{align*}
  \underset{u}{min} \quad f^*(-L^*u) + g^*(u)
\end{align*}

### Recalling principle of CP algorithm

The Chambolle-Pock algorithm takes initial estimates $x^{(0)}$ and $u^{(0)}$ of the primal and dual solutions, a parameter $\tau>0$, a second parameter $\sigma>0$ such that $\sigma \tau \|L\|^2 < 1$, and a relaxation parameter $0<\rho<2$, and iterates, for $k=1,2,\ldots$:
  
\begin{align*}
    u^{k} &= \mathrm{prox}_{\sigma g^*}( u^{k-1} + \sigma L(\tilde{x}^{k-1}) \\
    x^{k} &= \mathrm{prox}_{\tau f}(  x^{k-1}-\tau L^* u^{k} ) \\
    \tilde{x}^{k} &= x^{k} + \rho (x^{k}-x^{k-1})\\
\end{align*}
  
Where, $x^{(k)}$ converges to a primal solution $x^\star$ and $u^{(k)}$ converges to a dual solution $u^\star$.

We recall that being able to compute the proximity operator of $f^*$ is equivalent to being able to compute the proximity operator of $f$, thanks to the Moreau identity:
  
\begin{equation}
    x = \mathrm{prox}_{\gamma f^*}(x) + \gamma \mathrm{prox}_{\frac{f}{\gamma}}\left(\frac{x}{\gamma}\right)
\end{equation}

### Studying the $g$ part

We recall that the proximal operator for the convex indicator $\delta_{\mathbb{R}^{3+}}$ is simply the projection operator onto the non-negative orthant $\mathbb{R}^{3+}$:

\begin{align*}
  \text{prox}_{\delta_{\mathbb{R}^{3+}}}(x_n) = \max \left( 0, x_n \right)
\end{align*}


### Studying the $f$ part

We recall that the proximal operator of the $\ell^1$ norm is soft thresholding:

\begin{align*}
  \text{prox}_{\gamma ||\cdot||_1}(x_n) = \max \left( 0, 1-\frac{\gamma}{|x_n|} \right) x_n
\end{align*}
