# guidelines

TODO : import whenever needed, not centralized

states https://pytorch.org/tutorials/beginner/saving_loading_models.html

# Introduction 

## Aim

## Data

First load the dataset:

In [1]:
from data_utils import get_mnist

train_dataset, test_dataset = get_mnist(normalize=True)

In [2]:
import numpy as np
import random
import torch
import matplotlib.pyplot as plt
import pandas as pd

## Setup

Below one can find flags that will setup the notebook:

In [3]:
# Whether to tune the hyperparameters in this notebook
# Note that this might take a long time (especially for Adam)
hyperparameter_tune = False
prot_hyperparameter_tune = False

In [4]:
# Whether to use the GPU, if it's not available, this will be ignored
use_cuda = True
device = torch.device('cuda' if use_cuda and torch.cuda.is_available() else 'cpu')
print("Device chosen is {}".format(device))

Device chosen is cpu


We setup the training parameters that we will use all along the notebook, in order to improve readability in downstream code:

Note that we will use a model with a 10-dimensional output, where each output is passed through softmax. When receiving an output 

$$Z = \begin{bmatrix} \mathbf z_1 & \dots & \mathbf z_B \end{bmatrix}^\top \in \mathbb R^{B \times 10}$$

with $B$ the batch size, we first retrieve the maximal component of each $\mathbf z_i$:

$$\hat y_i = \text{argmax}_{k = 1, \ldots, 10} \; z_{ik}, \quad i = 1, \ldots, B$$

and then compute the accuracy:

$$\text{acc} = \frac 1 B \sum_{i=1}^B I\left\{ \hat y_i = y_i \right\} $$

with $I$ the indicator function and $y_i \in \{1, \ldots, 10\}$ the true target. 

In [5]:
from training import accuracy

training_config = {
    # Loss function
    'loss_fun': torch.nn.CrossEntropyLoss(),
    # Performance evaluation function
    'metric_fun': accuracy,
    # The device to train on
    'device': device,
    # Number of epochs
    'epochs': 10,
}

test_config = training_config.copy()
test_config.pop('epochs')

10

In [6]:
# View the source code
??accuracy

[0;31mSignature:[0m [0maccuracy[0m[0;34m([0m[0myhat[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mDocstring:[0m <no docstring>
[0;31mSource:[0m   
[0;32mdef[0m [0maccuracy[0m[0;34m([0m[0myhat[0m[0;34m,[0m [0my[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0mprediction[0m [0;34m=[0m [0myhat[0m[0;34m.[0m[0margmax[0m[0;34m([0m[0mdim[0m[0;34m=[0m[0;36m1[0m[0;34m)[0m[0;34m[0m
[0;34m[0m    [0;32mreturn[0m [0;34m([0m[0my[0m[0;34m.[0m[0meq[0m[0;34m([0m[0mprediction[0m[0;34m)[0m[0;34m)[0m[0;34m.[0m[0mto[0m[0;34m([0m[0mfloat[0m[0;34m)[0m[0;34m.[0m[0mmean[0m[0;34m([0m[0;34m)[0m[0;34m.[0m[0mitem[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mFile:[0m      ~/Documents/EPFL/CS439/optml_project/training.py
[0;31mType:[0m      function


# Model

We use a simple standard model for the MNIST dataset (can be found [here](https://github.com/floydhub/mnist/blob/master/ConvNet.py)).

In [7]:
from net import Net

In [8]:
??Net

[0;31mInit signature:[0m [0mNet[0m[0;34m([0m[0;34m)[0m[0;34m[0m[0;34m[0m[0m
[0;31mSource:[0m        
[0;32mclass[0m [0mNet[0m[0;34m([0m[0mnn[0m[0;34m.[0m[0mModule[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m    [0;34m"""ConvNet -> Max_Pool -> RELU -> ConvNet -> Max_Pool -> RELU -> FC -> RELU -> FC -> SOFTMAX"""[0m[0;34m[0m
[0;34m[0m    [0;32mdef[0m [0m__init__[0m[0;34m([0m[0mself[0m[0;34m)[0m[0;34m:[0m[0;34m[0m
[0;34m[0m        [0msuper[0m[0;34m([0m[0mNet[0m[0;34m,[0m [0mself[0m[0;34m)[0m[0;34m.[0m[0m__init__[0m[0;34m([0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mconv1[0m [0;34m=[0m [0mnn[0m[0;34m.[0m[0mConv2d[0m[0;34m([0m[0;36m1[0m[0;34m,[0m [0;36m20[0m[0;34m,[0m [0;36m5[0m[0;34m,[0m [0;36m1[0m[0;34m)[0m[0;34m[0m
[0;34m[0m        [0mself[0m[0;34m.[0m[0mconv2[0m [0;34m=[0m [0mnn[0m[0;34m.[0m[0mConv2d[0m[0;34m([0m[0;36m20[0m[0;34m,[0m [0

# Hyperparameter tuning

In [9]:
from torch.optim import Optimizer
from training import tune_optimizer
from optimizer import AdamOptimizer, NesterovOptimizer, MiniBatchOptimizer
from data_utils import get_best_hyperparams

If the `hyperparameter_tune` flag was set to `True` above, the following code will run hyperparameter tuning on all optimizers. Note that one can either run KFold cross validation (by providing `n_folds`) or use a simple train/test split (by providing `train_ratio`).

If the flag is set to `False`, the cell below will simply set up the hyperparameters that we carefully cross-validated:

In [10]:
optimizers = {
    AdamOptimizer: get_best_hyperparams('./res/adam_tuning_round3.json'),
    NesterovOptimizer: get_best_hyperparams('./res/nesterov_tuning_round2.json'),
    MiniBatchOptimizer: get_best_hyperparams('./res/minibatch_tuning_round2.json')
}

## Adam

In [11]:
search_grid_adam = {
        'lr': np.linspace(0.001, 0.01, 2),
        'beta1':  np.linspace(0.1, 0.9, 2),
        'beta2': np.linspace(0.5, 0.999, 2),
        'batch_size': [32, 64, 128],
        'weight_decay': np.linspace(0.001, 0.1, 2),
        'epsilon': np.linspace(1e-10, 1e-8, 2),
    }

if hyperparameter_tune:
    results_adam = tune_optimizer(
        model=Net().to(device),
        optim_fun=AdamOptimizer,
        xtrain=train_dataset.data,
        ytrain=train_dataset.targets,
        search_grid=search_grid_adam,
        nfolds=3,
        **training_config)

else:
    results_adam = optimizers[AdamOptimizer]

## Nesterov

In [12]:
search_grid_nesterov = {
    'lr': np.logspace(0, 1),
    'batch_size': [32, 64, 128]
}

if hyperparameter_tune:
    results_nesterov = tune_optimizer(
        model=Net().to(device),
        optim_fun=NesterovOptimizer,
        xtrain=train_dataset.data,
        ytrain=train_dataset.targets,
        search_grid=search_grid_nesterov,
        nfolds=3,
        **training_config
    )

else:
    results_nesterov = optimizers[NesterovOptimizer]

## Minibatch

In [13]:
dec_lr_set =  [0]*1 + [1]*1
random.shuffle(dec_lr_set)
search_grid_mini  = {
        'lr': np.linspace(0.00001, 0.01, 5),
        'batch_size': [32, 64, 128],
        'decreasing_lr': dec_lr_set,
    }
if hyperparameter_tune:
    results_mini = tune_optimizer(
        model=Net().to(device),
        optim_fun=MiniBatchOptimizer,
        xtrain=train_dataset.data,
        ytrain=train_dataset.targets,
        search_grid=search_grid_mini,
        nfolds=3,
        **training_config
    )

else:
    results_mini = optimizers[MiniBatchOptimizer]

In [40]:
print("ADAM: Highest Test Accuracy {:.4f} with standart deviation of {:.4f}".format(results_adam["metric_test"], results_adam["metric_test_std"]))
print("Hyperparameter set: Learning rate =  {:.4f}, Beta1 = {:.1f}, Beta2 = {:.3f}, Weight decay = {:.2f}, Epsilon = {:.8f},  Batch Size = {:.0f}\n".format(results_adam["lr"], results_adam["beta1"], results_adam["beta2"], results_adam["weight_decay"], results_adam["epsilon"], results_adam['batch_size']))
print("NESTEROV: Highest Test Accuracy {:.4f} with standart deviation of {:.4f}".format(results_nesterov["metric_test"], results_nesterov["metric_test_std"]))
print("Hyperparameter set: Learning rate =  {:.4f}, Batch Size = {:.0f}\n".format(results_nesterov["lr"], results_nesterov["batch_size"]))
print("MINIBATCH: Highest Test Accuracy {:.4f} with standart deviation of {:.4f}".format(results_mini["metric_test"], results_mini["metric_test_std"]))
print("Hyperparameter set: Learning rate =  {:.4f}, Decreasing Learning rate {:.1f}, Batch Size = {:.0f}\n".format(results_mini["lr"], results_mini["decreasing_lr"], results_mini["batch_size"]))


ADAM: Highest Test Accuracy 0.9868 with standart deviation of 0.0007
Hyperparameter set: Learning rate =  0.0001, Beta1 = 0.9, Beta2 = 0.999, Weight decay = 0.01, Epsilon = 0.00000001,  Batch Size = 32

NESTEROV: Highest Test Accuracy 0.9876 with standart deviation of 0.0010
Hyperparameter set: Learning rate =  0.0001, Batch Size = 64

MINIBATCH: Highest Test Accuracy 0.9886 with standart deviation of 0.0002
Hyperparameter set: Learning rate =  0.2639, Decreasing Learning rate 0.0, Batch Size = 128



## Comparison

### TODO

# Attack on naive model



In [41]:
from data_utils import build_data_loaders
from training import training, testing

## Train naive models

### Adam, Nesterov and Minibatch



In [51]:
naive_networks = dict()
data_naive = list()
batch_log_interval = 0

for optimizer, optimizer_params in optimizers.items():
    print(f'--- {optimizer}')
    optimizer_params = optimizer_params.copy()
    
    net = Net().to(device)
    # Instantiate data loaders with selected batch size
    batch_size = int(optimizer_params.pop('batch_size'))
    metric_test = optimizer_params.pop('metric_test')
    metric_test_std = optimizer_params.pop('metric_test_std')
    train_loader, test_loader = build_data_loaders(train_dataset, test_dataset, batch_size)
    # Instantiate optimizer
    optimizer_instance = optimizer(net.parameters(), **optimizer_params)
    # Train
    loss_train, acc_train = training(
        model=net, 
        dataset=train_loader, 
        optim=optimizer_instance,
        batch_log_interval=batch_log_interval,
        **training_config
    )
    # Test
    loss_test, acc_test = testing(
        model=net,
        dataset=test_loader,
        **test_config
    )
    # Log
    data_naive.append({
        'optimizer': str(optimizer),
        'loss_train': loss_train,
        'acc_train': acc_train,
        'loss_test': loss_test,
        'acc_test': acc_test
    })
    # Save naive model
    naive_networks[optimizer] = net

--- <class 'optimizer.AdamOptimizer'>
Launching training on cpu


KeyboardInterrupt: 

### Minibatch (for now, loop later)

## Attack naive models

In [33]:
from adversary import attack

In [34]:
epsilons = np.arange(0, 0.5, 0.05)

In [35]:
# use the lst_optimizer
# Only one optimizer used in this part?

### Adam

In [36]:
accuracy_naive_adam= []
losses_naive_adam= []

for eps in epsilons:
    loss_attack, acc_attack  = attack(net_naive_adam, training_config['loss_fun'],training_config['metric_fun'], test_loader, epsilon=eps, device=device)
    accuracy_naive_adam.append(acc_attack)
    losses_naive_adam.append(loss_attack)

Epsilon: 0.00	Test Accuracy = 0.970
Epsilon: 0.05	Test Accuracy = 0.965
Epsilon: 0.10	Test Accuracy = 0.956
Epsilon: 0.15	Test Accuracy = 0.947
Epsilon: 0.20	Test Accuracy = 0.932
Epsilon: 0.25	Test Accuracy = 0.916
Epsilon: 0.30	Test Accuracy = 0.891
Epsilon: 0.35	Test Accuracy = 0.855
Epsilon: 0.40	Test Accuracy = 0.809
Epsilon: 0.45	Test Accuracy = 0.745


### Nesterov

In [37]:
data_naive = list()

for optimizer, network in naive_networks.items():
    print(f'--- {optimizer}')
    
    for eps in epsilons:
        loss_attack, acc_attack = attack(
            model=network, 
            loss_fun=training_config['loss_fun'],
            test_loader=test_loader, 
            epsilon=eps, 
            device=training_config['loss_fun']
        )
        # Log
        data_naive.append({
            'optimizer': str(optimizer),
            'epsilon': eps,
            'loss': loss_attack,
            'acc': acc_attack
        })

### Minibatch (for now, loop later)

In [38]:
accuracy_naive= []
losses_naive= []

for eps in epsilons:
    loss_attack, acc_attack  = attack(net_naive_mini,  training_config['loss_fun'],training_config['metric_fun'], test_loader, epsilon=eps, device=device)
    accuracy_naive.append(acc_attack)
    losses_naive.append(loss_attack)

Epsilon: 0.00	Test Accuracy = 0.969
Epsilon: 0.05	Test Accuracy = 0.963
Epsilon: 0.10	Test Accuracy = 0.954
Epsilon: 0.15	Test Accuracy = 0.944
Epsilon: 0.20	Test Accuracy = 0.933
Epsilon: 0.25	Test Accuracy = 0.915
Epsilon: 0.30	Test Accuracy = 0.888
Epsilon: 0.35	Test Accuracy = 0.854
Epsilon: 0.40	Test Accuracy = 0.805
Epsilon: 0.45	Test Accuracy = 0.743


## Comparison

# Attack on robust model

## Hyperparameter optimization on robust models

- If the `prot_hyperparameter_tune` flag was set to `True` above, the following code will run hyperparameter tuning on all optimizers for robust models. Note that one can either run KFold cross validation (by providing `n_folds`) or use a simple train/test split (by providing `train_ratio`).


In [53]:
prot_optimizers = {
    AdamOptimizer: get_best_hyperparams('./res/prot_adam_tuning.json'),
    NesterovOptimizer: get_best_hyperparams('./res/prot_nesterov_tuning.json'),
    MiniBatchOptimizer: get_best_hyperparams('./res/prot_minibatch_tuning.json')
}

### Adam

In [54]:
search_grid_adam = {
        'lr': np.linspace(0.001, 0.01, 2),
        'beta1':  np.linspace(0.1, 0.9, 2),
        'beta2': np.linspace(0.5, 0.999, 2),
        'batch_size': [32, 64, 128],
        'weight_decay': np.linspace(0.001, 0.1, 2),
        'epsilon': np.linspace(1e-10, 1e-8, 2),
    }

if prot_hyperparameter_tune:
    results_adam_prot = tune_optimizer(
        model=Net().to(device),
        optim_fun=AdamOptimizer,
        xtrain=train_dataset.data,
        ytrain=train_dataset.targets,
        search_grid=search_grid_adam,
        nfolds=3,
        func=protected_training,
        **training_config)

else:
    results_adam_prot = optimizers[AdamOptimizer]

### Nesterov

In [55]:
search_grid_nesterov = {
    'lr': np.logspace(0, 1),
    'batch_size': [32, 64, 128]
}

if hyperparameter_tune:
    results_nesterov_prot = tune_optimizer(
        model=Net().to(device),
        optim_fun=NesterovOptimizer,
        xtrain=train_dataset.data,
        ytrain=train_dataset.targets,
        search_grid=search_grid_nesterov,
        nfolds=3,
        func=protected_training,
        **training_config
    )

else:
    results_nesterov_prot = optimizers[NesterovOptimizer]

### Minibatch

In [56]:
dec_lr_set =  [0]*1 + [1]*1
random.shuffle(dec_lr_set)
search_grid_mini  = {
        'lr': np.linspace(0.00001, 0.01, 5),
        'batch_size': [32, 64, 128],
        'decreasing_lr': dec_lr_set,
    }
if hyperparameter_tune:
    results_mini_prot = tune_optimizer(
        model=Net().to(device),
        optim_fun=MiniBatchOptimizer,
        xtrain=train_dataset.data,
        ytrain=train_dataset.targets,
        search_grid=search_grid_mini,
        nfolds=3,
        func=protected_training,
        **training_config
    )

else:
    results_mini_prot = optimizers[MiniBatchOptimizer]

## Train robust models

In [61]:
from adversary import protected_training

### Adam, Nesterov & MiniBatch



In [64]:
robust_networks = dict()
batch_log_interval = 0
epsilon = 0.25

for optimizer, optimizer_params in prot_optimizers.items():
    print(f'--- {optimizer}')
    # Instantiate model
    net = Net().to(device)
    # Instantiate optimizer
    optimizer_params = optimizer_params.copy()
    batch_size = int(optimizer_params.pop('batch_size'))
    metric_test = optimizer_params.pop('metric_test')
    metric_test_std = optimizer_params.pop('metric_test_std')
    optimizer_instance = optimizer(net.parameters(), **optimizer_params)
    # Instantiate data loaders
    train_loader, test_loader = build_data_loaders(train_dataset, test_dataset, batch_size)
    # Train robust model
    protected_training(
        model=net,
        dataset=train_loader,
        optim=optimizer_instance,
        batch_log_interval=batch_log_interval,
        **training_config
    )
    # Save robust net
    robust_networks[optimizer] = net

--- <class 'optimizer.AdamOptimizer'>


KeyboardInterrupt: 

## Attack robust models

In [None]:
accuracy_fgsm = dict()
losses_fgsm = dict()

accuracy_pgd = dict()
losses_pgd = dict()

for optimizer, optimizer_params in prot_optimizers.items():
    # Instantiate model
    net = robust_networks[optimizer]
    # Instantiate optimizer
    optimizer_params = optimizer_params.copy()
    batch_size = optimizer_params.pop('batch_size')



### Minibatch (for now, loop later)

In [None]:
accuracy_robust = []
losses_robust = []
epsilons = np.arange(0, 0.5, 0.05)

# This should be the first term test_loader is used
for eps in epsilons:
    loss_attack, acc_attack = attack(robust_net, criterion, prot_test_loader, eps, device=device)
    accuracy_robust.append(acc_attack)
    losses_robust.append(loss_attack)

### Adam

In [None]:
accuracy_robust_adam = []
losses_robust_adam = []
# This should be the first term test_loader is used
for eps in epsilons:
    loss_attack, acc_attack = attack(robust_net_adam, criterion, prot_test_loader, eps, device=device)
    accuracy_robust_adam.append(acc_attack)
    losses_robust_adam.append(loss_attack)

## Comparison

# Comparative analysis

### Minibatch (for now)

In [None]:
plt.figure(figsize=(5,5))
plt.plot(epsilons, accuracy_naive, "*-", c='blue', label='Naive Model')
plt.plot(epsilons, accuracy_robust, "*-", c='orange', label='Robust Model')

plt.yticks(np.arange(0, 1.1, step=0.1))
plt.xticks(np.arange(0, 0.5, step=0.05))

plt.title("Accuracy vs Epsilon")
plt.xlabel("Epsilon")
plt.ylabel("Accuracy")
plt.legend();

Lots of plots

* diff naive vs robust (algo as hue)