# guidelines

TODO : import whenever needed, not centralized

states https://pytorch.org/tutorials/beginner/saving_loading_models.html

# Introduction 

## Aim

## Data

# Import (Remove section later on)

In [1]:
import numpy as np
import torch
import matplotlib.pyplot as plt
import pandas as pd

## Setup

Below one can find flags that will setup the notebook:

In [2]:
# Whether to tune the hyperparameters in this notebook
# Note that this might take a long time (especially for Adam)
hyperparameter_tune = False

In [3]:
# Whether to use the GPU, if it's not available, this will be ignored
use_cuda = True

device = torch.device('cuda' if use_cuda and torch.cuda.is_available() else 'cpu')
print("Device chosen is {}".format(device))

Device chosen is cuda


We now load the dataset:

**TODO add data downloading**

In [4]:
# download

In [5]:
from data_utils import get_mnist

train_dataset, test_dataset = get_mnist(normalize=True)

We setup the training parameters that we will use all along the notebook, in order to improve readability in downstream code:

In [6]:
from training import accuracy

training_config = {
    # Loss function
    'loss_fun': torch.nn.CrossEntropyLoss(),
    # Performance evaluation function
    'metric_fun': accuracy,
    # The device to train on
    'device': device,
    # Number of epochs
    'epochs': 10
}

test_config = training_config.copy()
test_config.pop('epochs');

Note that we will use a model with a 10-dimensional output, where each output is passed through softmax. When receiving an output 

$$Z = \begin{bmatrix} \mathbf z_1 & \dots & \mathbf z_B \end{bmatrix}^\top \in \mathbb R^{B \times 10}$$

with $B$ the batch size, we first retrieve the maximal component of each $\mathbf z_i$:

$$\hat y_i = \text{argmax}_{k = 1, \ldots, 10} \; z_{ik}, \quad i = 1, \ldots, B$$

and then compute the accuracy:

$$\text{acc} = \frac 1 B \sum_{i=1}^B I\left\{ \hat y_i = y_i \right\} $$

with $I$ the indicator function and $y_i \in \{1, \ldots, 10\}$ the true target. 

In [7]:
# View the source code
??accuracy

[0;31mSignature:[0m [0maccuracy[0m[0;34m([0m[0myhat[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
[0;32mdef[0m [0maccuracy[0m[0;34m([0m[0myhat[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0mprediction[0m [0;34m=[0m [0myhat[0m[0;34m.[0m[0margmax[0m[0;34m([0m[0mdim[0m[0;34m=[0m[0;36m1[0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0;34m([0m[0my[0m[0;34m.[0m[0meq[0m[0;34m([0m[0mprediction[0m[0;34m)[0m[0;34m)[0m[0;34m.[0m[0mto[0m[0;34m([0m[0mfloat[0m[0;34m)[0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m)[0m[0;34m.[0m[0mitem[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      /media/maousi/Data/Documents/Programmation/courses/DS-MA2/optml_project/training.py
[0;31mType:[0m      function


# Model

We use a simple standard model for the MNIST dataset (can be found [here](https://github.com/floydhub/mnist/blob/master/ConvNet.py)).

In [8]:
from net import Net

In [9]:
??Net

[0;31mInit signature:[0m [0mNet[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mNet[0m[0;34m([0m[0mnn[0m[0;34m.[0m[0mModule[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""ConvNet -> Max_Pool -> RELU -> ConvNet -> Max_Pool -> RELU -> FC -> RELU -> FC -> SOFTMAX"""[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0mself[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0msuper[0m[0;34m([0m[0mNet[0m[0;34m,[0m [0mself[0m[0;34m)[0m[0;34m.[0m[0m__init__[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mconv1[0m [0;34m=[0m [0mnn[0m[0;34m.[0m[0mConv2d[0m[0;34m([0m[0;36m1[0m[0;34m,[0m [0;36m20[0m[0;34m,[0m [0;36m5[0m[0;34m,[0m [0;36m1[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mconv2[0m [0;34m=[0m [0mnn[0m[0;34m.[0m[0mConv2d[0m[0;34m([0m[0;36m20[0m[0;34m,[0m [0

# Hyperparameter tuning

In [10]:
from training import tune_optimizer
from optimizer import AdamOptimizer, NesterovOptimizer, MiniBatchOptimizer

If the `hyperparameter_tune` flag was set to `True` above, the following code will run hyperparameter tuning on all optimizers. Note that one can either run KFold cross validation (by providing `n_folds`) or use a simple train/test split (by providing `train_ratio`).

If the flag is set to `False`, the cell below will simply set up the hyperparameters that we carefully cross-validated. Check the notebook **Hyperparam-tuning.ipynb** for details.

In [11]:
# Pre-define best parameters, used if hyperparameter_tune = False
optimizers = {
    AdamOptimizer: {'lr': 8e-05, 'beta1': 0.9, 'beta2': 0.999, 'weight_decay': 0.01, 'epsilon': 1e-08, 'batch_size': 32},
    NesterovOptimizer: {'lr': 5e-05, 'batch_size': 64},
    #MiniBatchOptimizer: None
}

## Utility function

## Nesterov

In [12]:
from optimizer import NesterovOptimizer

## Adam

## Minibatch

## Comparison

# Attack on naive model



In [13]:
from data_utils import build_data_loaders
from training import training, testing

## Train naive models

In [14]:
naive_networks = dict()
data_naive = list()
batch_log_interval = 0

for optimizer, optimizer_params in optimizers.items():
    print(f'--- {optimizer}')
    optimizer_params = optimizer_params.copy()
    
    net = Net().to(device)
    # Instantiate data loaders with selected batch size
    batch_size = optimizer_params.pop('batch_size')
    train_loader, test_loader = build_data_loaders(train_dataset, test_dataset, batch_size)
    # Instantiate optimizer
    optimizer_instance = optimizer(net.parameters(), **optimizer_params)
    # Train
    loss_train, acc_train = training(
        model=net, 
        dataset=train_loader, 
        optim=optimizer_instance,
        batch_log_interval=batch_log_interval,
        **training_config
    )
    # Test
    loss_test, acc_test = testing(
        model=net,
        dataset=test_loader,
        **test_config
    )
    # Log
    data_naive.append({
        'optimizer': str(optimizer),
        'loss_train': loss_train,
        'acc_train': acc_train,
        'loss_test': loss_test,
        'acc_test': acc_test
    })
    # Save naive model
    naive_networks[optimizer] = net

--- <class 'optimizer.AdamOptimizer'>
Launching training on cuda
epoch 0	avg epoch loss = 0.8359	avg epoch acc = 0.7553
epoch 1	avg epoch loss = 0.1242	avg epoch acc = 0.9628
epoch 2	avg epoch loss = 0.07036	avg epoch acc = 0.9789
epoch 3	avg epoch loss = 0.05043	avg epoch acc = 0.9844
epoch 4	avg epoch loss = 0.03867	avg epoch acc = 0.9882
epoch 5	avg epoch loss = 0.03056	avg epoch acc = 0.9909
epoch 6	avg epoch loss = 0.0246	avg epoch acc = 0.9932
epoch 7	avg epoch loss = 0.01974	avg epoch acc = 0.9946
epoch 8	avg epoch loss = 0.01579	avg epoch acc = 0.9957
epoch 9	avg epoch loss = 0.01247	avg epoch acc = 0.997
training took 44.15 s
Avg test loss = 0.0311	Avg test acc = 0.989
--- <class 'optimizer.NesterovOptimizer'>
Launching training on cuda
epoch 0	avg epoch loss = 1.453	avg epoch acc = 0.5861
epoch 1	avg epoch loss = 0.2344	avg epoch acc = 0.9324
epoch 2	avg epoch loss = 0.09442	avg epoch acc = 0.9705
epoch 3	avg epoch loss = 0.06535	avg epoch acc = 0.9792
epoch 4	avg epoch loss 

### Minibatch (for now, loop later)

## Adam

## Nesterov



## Attack naive models

In [15]:
from adversary import attack

In [16]:
epsilons = np.arange(0, 0.5, 0.05)

In [17]:
# use the lst_optimizer
# Only one optimizer used in this part?

### Minibatch (for now, loop later)

### Adam

### Nesterov

In [19]:
test_config

{'loss_fun': CrossEntropyLoss(),
 'metric_fun': <function training.accuracy(yhat, y)>,
 'device': device(type='cuda')}

In [20]:
data_naive = list()

for optimizer, network in naive_networks.items():
    print(f'--- {optimizer}')
    
    for eps in epsilons:
        loss_attack, acc_attack = attack(
            model=network, 
            test_loader=test_loader, 
            epsilon=eps,
            **test_config
        )
        # Log
        data_naive.append({
            'optimizer': str(optimizer),
            'epsilon': eps,
            'loss': loss_attack,
            'acc': acc_attack
        })

--- <class 'optimizer.AdamOptimizer'>
Epsilon: 0.00	Test Accuracy = 0.911
Epsilon: 0.05	Test Accuracy = 0.900
Epsilon: 0.10	Test Accuracy = 0.890
Epsilon: 0.15	Test Accuracy = 0.875
Epsilon: 0.20	Test Accuracy = 0.858
Epsilon: 0.25	Test Accuracy = 0.832
Epsilon: 0.30	Test Accuracy = 0.797
Epsilon: 0.35	Test Accuracy = 0.749
Epsilon: 0.40	Test Accuracy = 0.686
Epsilon: 0.45	Test Accuracy = 0.610
--- <class 'optimizer.NesterovOptimizer'>
Epsilon: 0.00	Test Accuracy = 0.975
Epsilon: 0.05	Test Accuracy = 0.969
Epsilon: 0.10	Test Accuracy = 0.964
Epsilon: 0.15	Test Accuracy = 0.958
Epsilon: 0.20	Test Accuracy = 0.947
Epsilon: 0.25	Test Accuracy = 0.929
Epsilon: 0.30	Test Accuracy = 0.905
Epsilon: 0.35	Test Accuracy = 0.874
Epsilon: 0.40	Test Accuracy = 0.825
Epsilon: 0.45	Test Accuracy = 0.755


## Comparison

**TODO COMPARE**

# Attack on robust model

## Train robust models

In [23]:
from adversary import protect

### Minibatch (for now, loop later)

## Adam

## Nesterov



In [24]:
robust_networks = dict()
batch_log_interval = 0
epsilon = 0.25

for optimizer, optimizer_params in optimizers.items():
    print(optimizer)
    # Instantiate model
    net = Net().to(device)
    # Instantiate optimizer
    optimizer_params = optimizer_params.copy()
    batch_size = optimizer_params.pop('batch_size')
    optimizer_instance = optimizer(net.parameters(), **optimizer_params)
    # Instantiate data loaders
    train_loader, test_loader = build_data_loaders(train_dataset, test_dataset, batch_size)
    # Train robust model
    protect(
        model=net,
        optim=optimizer_instance,
        train_loader=train_loader,
        test_loader=test_loader,
        epsilon=epsilon,
        **training_config
    )
    # Save robust net
    robust_networks[optimizer] = net

Epoch 0.00 | Test accuracy: 0.88678
Epoch 1.00 | Test accuracy: 0.92053
Epoch 2.00 | Test accuracy: 0.93650
Epoch 3.00 | Test accuracy: 0.94299
Epoch 4.00 | Test accuracy: 0.94918
Epoch 5.00 | Test accuracy: 0.95218
Epoch 6.00 | Test accuracy: 0.95517
Epoch 7.00 | Test accuracy: 0.95567
Epoch 8.00 | Test accuracy: 0.95627
Epoch 9.00 | Test accuracy: 0.95777
training took 68.52 s
Epoch 0.00 | Test accuracy: 0.67765
Epoch 1.00 | Test accuracy: 0.92864
Epoch 2.00 | Test accuracy: 0.96039
Epoch 3.00 | Test accuracy: 0.97323
Epoch 4.00 | Test accuracy: 0.97641
Epoch 5.00 | Test accuracy: 0.97900
Epoch 6.00 | Test accuracy: 0.98059
Epoch 7.00 | Test accuracy: 0.98358
Epoch 8.00 | Test accuracy: 0.98597
Epoch 9.00 | Test accuracy: 0.98706
training took 31.16 s


## Attack robust models

### Minibatch (for now, loop later)

### Adam

**TODO put the loop**

## Comparison

**TODO**

# Attack 2

# Comparative analysis

### Minibatch (for now)

In [None]:
plt.figure(figsize=(5,5))
plt.plot(epsilons, accuracy_naive, "*-", c='blue', label='Naive Model')
plt.plot(epsilons, accuracy_robust, "*-", c='orange', label='Robust Model')

plt.yticks(np.arange(0, 1.1, step=0.1))
plt.xticks(np.arange(0, 0.5, step=0.05))

plt.title("Accuracy vs Epsilon")
plt.xlabel("Epsilon")
plt.ylabel("Accuracy")
plt.legend();

Lots of plots

* diff naive vs robust (algo as hue)