### Optimizer description: ZSCG
**Name:**  Zero Stochastic Conditional Gradient<br>
**Class:** zeroOptim.ClassicZSCG <br>
**Paper:** *Zeroth-order Nonconvex Stochastic Optimization: Handling Constraints, High-Dimensionality and Saddle-Points* (rishnakumar Balasubramanian†1 and Saeed Ghadimi‡2) <br>


**Description:** <br>
The Zero-order Stochastic Conditional Gradient Descent at each iteration *k* try to minimize *F(z)* with these 3 main steps:

    1. Estimate the gradient as follow:
$$G_{v}^{k} \equiv G_{v}(z_{k-1}, \xi_{k}, u_{k}) = \frac{1}{m_{k}} \sum_{j=1}^{m_{k}} \frac{F(z_{k-1} + vu_{k,j}, \xi_{k,j}) - (z_{k-1}, \xi_{k,j})}{v}u_{k,j}$$
    2. Solve this linear programming problem
$$x_{k} = argmin_{u\in\chi}\langle G_{v}^{k}, u\rangle$$ 
    3. Update z
$$z_{k+1} = (1-\alpha_{k})z_{k} + \alpha_{k} x_{k}$$ 

where: <br>
$z_{k}$ is our optimization parameter <br>
$\xi_{k}$ is a sample of our distribution <br>
$u_{k,j} \sim N(0, I_{d})$ <br>
$m_{k}$ is the number of gaussian vector to generate <br>
$\alpha_{k}$ is the momentum at time k <br>
$v$ is the gaussian smoothing parameter <br>

**Args:**

        Name            Type                Description
        x               (torch.tensor)      The variable of our optimization problem. Should be a 3D tensor (img)
        v               (float)             The gaussian smoothing
        n_gradient      (list)              Number of normal vector to generate at every step
        ak              (list)              Momentum  every step
        epsilon         (float)             The upper bound of norm
        L_type          (int)               Either -1 for L_infinity or x for Lx. Default is -1
        batch_size      (int)               Maximum parallelization during the gradient estimation. Default is -1 (=n_grad)
        C               (tuple)             The boundaires of the pixel. Default is (0, 1)
        max_steps       (int)               The maximum number of steps. Default is 100
        verbose         (int)               Display information or not. Default is 0
        additional_out  (bool)              Return also all the x. Default is False
        tqdm_disable    (bool)              Disable the tqdm bar. Default is False


     
     
**Suggested values:** <br>
$v = \sqrt{\frac{2B_{L_{\sigma}}}{N(d+3)^3}}$, 
$\alpha_{k} =\frac{1}{\sqrt{N}}$,
$m_{k} = 2B_{L_{\sigma}}(d + 5)N$,
$\forall k \geq 1$

where:<br>
- *N* is the number of steps <br>
- *d* is the dimension of *x* <br>
- $\sigma$ is the Strong Convexity coefficient
- $B \geq ||f(x)||, \forall x \in \chi$
- $B_{L_{\sigma}} ≥ max\bigg\{\sqrt{\frac{B^2 + \sigma^2}{L}}, 1\bigg\}$

**Empirical values:** <br>
In case of MNIST we can set:<br>
$N = 100$, $B_{L_{\sigma}} = 1$ and we have a image 28 * 28 ($d = 784$), so:<br>
- $v = 10e-6$
- $\alpha_{k} = 0.1$
- $m_{k} = 78900$

**N.B** <br>
In reality it seems that $\alpha_{k}$ could be set much higher (e.g. 0.3)  and $m_{k}$ could be set much lower (e.g. 600) and doesn't need to be dependent of the number of steps *N*.

