## Optimizer instruction

#### COMMON PARAMETERS

All the optimizer class must have in common the following points

**1. The initializer of the optimizer must take the following args:**


    Args:
    Name            Type                Description
    model           torch.nn.Module     The model to attack
    loss            src.Loss.Loss       The loss to use
    device           torch.device        The device used for the computation
    
    Code example:
```
device = torch.device('cuda')
myNet    = myCustomNetwork()
myLoss = myCustomLoss()
myOptimizer = myOptimizerClass(model = myNet, loss = myLoss, device = device)
```
    
    
**2. The optimizer must have the run method with the following args:**


    Args:
    Name            Type                Description
    x:              (torch.tensor)      The variable of our optimization problem. Must be 3D tensor (img)
    n_gradient:     (int)               The number of function evaluation to do to perform the gradient estimation 
    batch_size:     (int)               The maximum parallelization duting the gradient estimation. Default is -1 (=n_grad) 
    C:              (tuple)             The boundaires of the pixel. Default is (0, 1)
    max_steps:      (int)               The maximum number of steps. Default is 100
    verbose:        (int)               Display information or not. Default is 0
    additional_out  (bool)              Return also all the x found during the process. Default is False
    tqdm_disable    (bool)              Disable the tqdm bar. Default is False

    Other common args:
    Name            Type                Description
    epsilon:        (float)             The upper bound of the norm
    L_type:         (int)               Either -1 for L_infinity or Lx for Lx. Default is -1
    
    Code example:
    args              = {'x': my_chosen_tensor_3d_img, 'n_gradient': 3000, 'batch_size' = 100, ...}
    other_common_args = {'epsilon': 0.3, 'L_type': -1}
    specific_args     = {'beta': 0.2, 'gamma': 0.3}
    myOptimizer.run(**args, **other_common_args, **specific_args)
      

**3. The optimizer must return the following output:**


    Outputs
    Name            Type                Description
    x:              (torch.tensor)      The last found optimization variable. Must be 3D tensor (img)
    losses:         (list)              A list (of float) of the losses at each step
    outs:           (list)              A list (of float) of the outputs of the target neuron at each step
    
    Only if additional_out = True
    list_of_x       (list)              A list of torch.tensor of the input at each step
    
    Code example:
    new_x, loss_list, out_list = myOptimizer.run(**args, **other_common_args, **specific_args)

    
**4. Stopping criterion:** <br>

    4.1 If the attack is untargeted and the prior label of the original example is n, STOP when the argmax of the model 
        output of the last found example is different from n 
    4.2 If the attack is targeted with target label m and the original example has label n, STOP when the argamax of 
        the model output of the last found example is m
        
    Code example:
    
    condition1 = (int(torch.argmax(out)) != self.loss.neuron) and (self.loss.maximise == 0)
    condition2 = (int(torch.argmax(out)) == self.loss.neuron) and (self.loss.maximise == 1)
    if condition1 or condition2:
        return x, losses, outs

### Optimizer description
Create a file (markdown) possibly where all the arguments are explained and then write dwon the suggest value from the papers (if any) and the empirical one found in the datasets.


**Example 1** 

**Name:**  Zero Stochastic Gradient Descent <br>
**Class:** zeroOptim.zeroSGD <br>
**Paper:** *Zeroth-order Nonconvex Stochastic Optimization: Handling Constraints, High-Dimensionality and Saddle-Points* (rishnakumar Balasubramanian†1 and Saeed Ghadimi‡2) <br>


**Description:**

The zero-order Stochastic Gradient Descent its like the classical SGD but with the gradient G computed as follows:

$$G_{v}^{k} \equiv G_{v}(z_{k-1}, \xi_{k}, u_{k}) = \frac{1}{m_{k}} \sum_{j=1}^{m_{k}} \frac{F(z_{k-1} + vu_{k,j}, \xi_{k,j}) - (z_{k-1}, \xi_{k,j})}{v}u_{k,j}$$

where: <br>
$z_{k}$ is our optimization parameter <br>
$\xi_{k}$ is a sample of our distribution <br>
$u_{k,j} \sim N(0, I_{d})$ <br>
$m_{k}$ is an input parameter <br>
$v_{k}$ is the gaussian smoothing parameter

**Args:**
    
    Name            Type                Description
    x:              (torch.tensor)      The variable of our optimization problem. Should be a 3D tensor (img)
    v:              (float)             The gaussian smoothing
    n_gradient:     (list)              Number of normal vector to generate at every step. Its (mk)
    ak              (list)              Pseudo learning rate every step
    epsilon:        (float)             The upper bound of the infinity norm
    C:              (tuple)             The boundaries of the pixel. Default is (0, 1)
    max_steps:      (int)               The maximum number of steps. Default is 100
    verbose:        (int)               Display information or not. Default is 0
    additional_out  (bool)              Return also all the x. Default is False
    tqdm_disable    (bool)              Disable the tqdm bar. Default is False
     
**Suggested values:**

$v = \sqrt{\frac{2B_{L_{\sigma}}}{N(d+3)^3}}$, 
$\alpha_{k} =\frac{1}{\sqrt{N}}$,
$m_{k} = 2B_{L_{\sigma}}(d + 5)N$,
$\forall k \geq 1$

where:<br>
- *N* is the number of steps <br>
- *d* is the dimension of *x* <br>
- $B_{L_{\sigma}} ≥ max\bigg\{\sqrt{\frac{B^2 + \sigma^2}{L}}, 1\bigg\}$

