# Proximal algorithms

In this notebook, we code our proximal optimization algorithms.

# 1. Proximal Gradient algorithm

For minimizing a function $F:\mathbb{R}^n \to \mathbb{R}$ equal to $f+g$ where $f$ is differentiable and the $\mathbf{prox}$ of $g$ is known, given:
* the function to minimize `F`
* a 1st order oracle for $f$ `f_grad` 
* a proximity operator for $g$ `g_prox` 
* an initialization point `x0`
* the sought precision `PREC` 
* a maximal number of iterations `ITE_MAX` 
* a display boolean variable `PRINT` 

these algorithms perform iterations of the form
$$ x_{k+1} = \mathbf{prox}_{\gamma g}\left( x_k - \gamma \nabla f(x_k) \right) $$
where $\gamma$ is a stepsize to choose.

In [None]:
import numpy as np
import timeit

def proximal_gradient_algorithm(F , f_grad , g_prox , x0 , step , PREC , ITE_MAX , PRINT ):
    x = np.copy(x0)
    x_tab = np.copy(x)
    if PRINT:
        print("------------------------------------\n Proximal gradient algorithm\n------------------------------------\nSTART    -- stepsize = {:0}".format(step))
    t_s =  timeit.default_timer()
    for k in range(ITE_MAX):
        g = f_grad(x)
        x = g_prox(x - step*g , step)  #######  ITERATION

        x_tab = np.vstack((x_tab,x))


    t_e =  timeit.default_timer()
    if PRINT:
        print("FINISHED -- {:d} iterations / {:.6f}s -- final value: {:f}\n\n".format(k,t_e-t_s,F(x)))
    return x,x_tab

# 2. ADMM

For minimizing a function $F:\mathbb{R}^n \to \mathbb{R}$ equal to $f+g$ where the $prox$ of $f$ and $g$ are known, given:
* the function to minimize `F`
* a proximity operator for $f$ `f_prox` 
* a proximity operator for $g$ `g_prox` 
* a parameter $\rho>0$ `rho` 
* an initialization point `x0`
* the sought precision `PREC` 
* a maximal number of iterations `ITE_MAX` 
* a display boolean variable `PRINT` 

The ADMM perform iterations of the form
\begin{align*}
x_{k+1} &= \mathbf{prox}_{f/\rho} \left( z_k - \lambda_k/\rho \right) \\
z_{k+1} &= \mathbf{prox}_{g/\rho} \left( x_{k+1} + \lambda_k/\rho \right) \\
\lambda_{k+1} &= \lambda_k + \rho\left( x_{k+1} - z_{k+1} \right)
\end{align*}
where $\rho>0$ is an hyper-parameter to set. It is also greatly interesting to keep track of the *primal and dual residuals*:
$$ p_k = x_k - z_k \text{ [Primal residual]}  ~~~~ \text{ and } ~~~~  d_k = \rho(z_k - z_{k-1} )  \text{[Primal residual]}$$

The values return by the function should be:
* the final $z$-point  `x`
* the table of the iterates $(z_k)$ for all iterations `x_tab`
* the list of the norms of the primal residuals $(\|p_k\|_2) $ for all iterations `p_tab`
* the list of the norms of the dual residuals $(\|d_k\|_2) $ for all iterations `d_tab`



> Fill the function below with the ADMM.

> Implement a precision stopping criteria. For Instance, in `Boyd et al. "Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers" Foundations and Trends in Machine Learning, 2011`, the adviced termination criterion is 
$$ \|p_k\|_2 \leq \varepsilon \text{ and }  \|d_k\|_2 \leq \varepsilon . $$

In [None]:
import numpy as np
import timeit

def ADMM(F , f_prox , g_prox , rho , x0 , PREC , ITE_MAX , PRINT ):
    x = np.copy(x0)
    z = np.copy(x0)
    lam = np.copy(x0)
    
    x_tab = np.copy(x)
    p_tab = []
    d_tab = []
        
    if PRINT:
        print("------------------------------------\n ADMM\n------------------------------------\nSTART    -- rho = {:0}".format(rho))
    t_s =  timeit.default_timer()
    for k in range(ITE_MAX):

        ### UPDATE : TO COMPLETE
        

        x_tab = np.vstack((x_tab,z))

        p = 1.0 #TODO
        d = 1.0 #TODO
        p_tab.append(float(p))
        d_tab.append(float(d))

        # STOPPING CRITERIA TO IMPLEMENT
        

    t_e =  timeit.default_timer()
    if PRINT:
        print("FINISHED -- {:d} iterations / {:.6f}s -- final value: {:f}\n\n".format(k,t_e-t_s,F(x)))
    return z,x_tab,p_tab,d_tab