### Optimizer description: ZSGD
**Name:**  Zero Stochastic Gradient Descent <br>
**Class:** zeroOptim.ZeroSGD <br>
**Paper:** *Zeroth-order Nonconvex Stochastic Optimization: Handling Constraints, High-Dimensionality and Saddle-Points* (rishnakumar Balasubramanian†1 and Saeed Ghadimi‡2) <br>


**Description:** <br>
The zero-order Stochastic Gradient Descent its like the classical SGD but with the gradient G computed as follows:

$$G_{v}^{k} \equiv G_{v}(x_{k-1}, \xi_{k}, u_{k}) = \frac{1}{m_{k}} \sum_{j=1}^{m_{k}} \frac{F(x_{k-1} + vu_{k,j}, \xi_{k,j}) - (x_{k-1}, \xi_{k,j})}{v}u_{k,j}$$

where: <br>
$x_{k}$ is our optimization parameter <br>
$\xi_{k}$ is a sample of our distribution <br>
$u_{k,j} \sim N(0, I_{d})$ <br>
$m_{k}$ is an input parameter <br>
$v_{k}$ is the gaussian smoothing parameter

**Args:**
    
    Name            Type                Description
    x:              (torch.tensor)      The variable of our optimization problem. Should be a 3D tensor (img)
    v:              (float)             The gaussian smoothing
    n_gradient:     (list)              Number of normal vector to generate at every step. Its (mk)
    ak              (list)              Pseudo learning rate every step
    epsilon:        (float)             The upper bound of the infinity norm
    L_type:         (int)               Either -1 for L_infinity or Lx for Lx. Default is -1
    batch_size      (int)               Maximum parallelization number during gradient estimation. Default -1 (n_gradient)
    C:              (tuple)             The boundaries of the pixel. Default is (0, 1)
    max_steps:      (int)               The maximum number of steps. Default is 100
    verbose:        (int)               Display information or not. Default is 0
    additional_out  (bool)              Return also all the x. Default is False
    tqdm_disable    (bool)              Disable the tqdm bar. Default is False
     
**Suggested values:**

$v \leq \frac{1}{\sqrt{L C\log d}} min \bigg\{\sqrt{\frac{2\sigma^2}{L}}, \sqrt{\frac{D_{0}LC}{2N\sigma^2}}\bigg\}$, 
$\alpha_{k} =\frac{1}{2L C\log d} min \bigg\{\frac{1}{12s \log d}, \sqrt{\frac{D_{0}}{N}}\bigg\}$,
$m_{k} = 2(d + 5)N$,
$\forall k \geq 1$

where:<br>
- *N* is the number of steps <br>
- *d* is the dimension of *x* <br>
- L is the Lipschitz Continuos Gradient Constant
- $C \geq \frac{E\big[||u||_{\infty}^k\big]}{\sqrt{2\log d^k}}$
- $s \geq ||\nabla f(x)||_{0}, \forall x \in \mathbb{R}^d$
- $D_{0} \geq f(x_{0}) - f^*$ <br>
- $B_{L_{\sigma}} ≥ max\bigg\{\sqrt{\frac{B^2 + \sigma^2}{L}}, 1\bigg\}$

