Gradient Direction of Robust Models

This code reproduces part of the results presented in the paper Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent. This code implementation is done in Linux, using Python 3 and PyTorch.

The main objective of the paper is to quantitatively answer the question "To what direction the gradients of models with respect to its inputs align after robust training?". We propose to answer that question with the direction that connects the current input with the closest example in the support of the closest inaccurate class in decision space. To test if this direction is directly related to robustness, we propose a metric measuring the alignment between gradient and the proposed direction. We show that this alignment increases with robust training and that the proposed direction gets a closer alignment than another gradient alignment metric from the literature. Also, we show that increasing the alignment of the gradient with a penalty term on the loss increases robustness.

The COPD dataset used in the paper is private and is not available for outside investigators. The respective code is only provided for reference purposes.

Setup

To install all the needed libraries, you can use the requirements_indirect.sh and requirements_direct.sh files. They assumes you have conda or miniconda installed and creates conda environments called gdrm_indirect and gdrm_direct, with prerequisites installed. Activate the environment before running the code, using conda activate gdrm_indirect or conda activate gdrm_direct, for running experiments with the indirect method for estimating $\Delta x$ (Section 2.3.2 in the paper), or running experiments with the direct method for estimating $\Delta x$ (Section 2.3.1 in the paper), respectively.
Each folder inside the src/ folder has its own requirements.txt listing imported libraries and versions.

Usage

Check the file running_commands.csv for commands used for all training and validation done for the paper and results presented here.

All commands select the GPU indexed by 0, but you can change the argument gpus according to your needs.
For the test commands, replace the <timestamps-id> expression with the respective value of the training experiment folder.
For the test commands with the CIFAR-10 dataset, replace the <best_epoch> expression with the best epoch in terms of epsilon_0.5 considering values from epoch 33 to epoch 100.
For the ImageNet command, replace the <robustbench_model_name> expression with the desired model from the Linf eps=4/255 ImageNet RobustBench leaderboard.
You can run python -m src.indirect_method.train --help and python -m src.direct_method.train --help to see all available options for modifying the runs.
All commands should be run from the project base folder.
To check test scores, open the log.txt file inside the experiment folder (./runs/...).
The first time some datasets are used, H5 files are created for faster loading of datasets in subsequent runs, so the first run may take an unusually long time to start producing outputs.

Results of pre-trained models

Example pre-trained models are available at https://www.sci.utah.edu/~datasets/gradient-direction-of-robust-models/pretrained_models.zip. To get the numbers provided in the tables below, use the commands provided in running_commands.csv, replacing the respective --load_checkpoint_g= and --load_checkpoint_d= with the respective paths to the provided pre-trained models. For the MNIST-3/5, MNIST, and CIFAR-10 datasets, use the generator folder for training the models with the cosine alignment penalty, and the generator_reference folder for testing the alignment of the robust methods. For example, for generating the result generator alignments and images for the Squares dataset, use

python -m src.train --experiment=square_vrgan_test --gpus=0 --nepochs=1 --dataset_to_use=squares --skip_train=true --split_validation=test --vrgan_training=true --load_checkpoint_g=./pretrained_models/square/generator/state_dict_g_best_epoch

. As another example, for getting the images and black-box numbers for the $L_{\alpha}$ method for the MNIST-3/5 dataset, use

python -m src.train --dataset_to_use=mnist --experiment=mnist_cosine_test_bbox --gpus=0 --split_validation=test --unet_downsamplings=2 --load_checkpoint_g=./pretrained_models/mnist35/generator_reference/state_dict_g_best_epoch --nepochs=1 --skip_train=true --epsilons_val_attack 0.02 0.04 0.06 0.1 0.14 0.2 0.4 0.6 0.8 1.0 1.2 1.4 --load_checkpoint_d=./pretrained_models/mnist35/cosine/state_dict_d_best_epoch --blackbox_attack=true

. More complete results for the method are given in the paper, including averaged results over five random seeds.

Direct Gradient Estimation (Section 2.3.1 in the paper)

This section shows results for estimating the vector connecting an input to its closest example of the opposite class in binary datasets.

Image Datasets

Dataset	Input image $(x)$	Estimated vector to the closest point of the opposite class $\left(\widehat{\Delta x}\right)$	Groundtruth for $\widehat{\Delta x}$ $(\Delta x)$	Cosine similarity between $\widehat{\Delta x}$ and $\Delta x$ $\left(\textit{sim}\left({\Delta x},\widehat{\Delta x}\right)\right)$
Square				0.869
MNIST-3/5

Sphere Dataset

$\textit{sim}\left({\Delta x},\widehat{\Delta x}\right): 0.90$

Class	$\left\lVert x \right\rVert_2$	$\left\lVert \widehat{\Delta x} \right\rVert_2$	$\left\lVert {\Delta x} \right\rVert_2$	$\left\lVert \hat{x}'\right\rVert_2$	$\left\lVert {x}' \right\rVert_2$
0	1.0	0.31	0.3	1.29	1.3
1	1.3	0.34	0.3	1.00	1.0

Indirect Gradient Estimation (Section 2.3.2 in the paper)

This section shows results for indirectly estimating the vector connecting an input to its closest example in the support of another class.

Image Datasets

Dataset	Input image $(x)$	Estimated vector to the closest point of the closest class $\left(\widehat{\Delta x}\right)$	Groundtruth for $\widehat{\Delta x}$ $(\Delta x)$	Cosine similarity between $\widehat{\Delta x}$ and $\Delta x$ $\left(\textit{sim}\left({\Delta x},\widehat{\Delta x}\right)\right)$
Square32				0.608
MNIST
CIFAR-10

Robustness

Image Datasets

Dataset	Method	Accuracy (%)	Median robustness against PGD Attack $(p=\infty)$ $(\epsilon_{50\%}, \text{PGD}_{p=\infty})$	Median robustness against PGD Attack $(p=2)$ $(\epsilon_{50\%}, \text{PGD}_{p=2})$	Median robustness against Square Attack $(p=\infty)$ $(\epsilon_{50\%},\text{BlBox}_{p=\infty})$	Average alignment between $\nabla_{\ell(x)}$ and $\widehat{\Delta x}$ $(\overline{\alpha_{\Delta x}})$	Average alignment between $\nabla_{logit(x)}$ and $x$ $(\overline{\alpha_{x}})$
Square	Baseline	100%	0.022	12.3	0.107	0.0173	0.0087
Square	$L_{\alpha}$	100%	0.408	63.0	0.405	0.927	0.353
Square	PGD	100%	0.479	62.1	0.323	0.198	0.179
MNIST-3/5	Baseline	99.7%	0.192	2.28	0.234	0.194	0.011
MNIST-3/5	$L_{\alpha}$	98.9%	0.347	3.46	0.341	0.609	0.255
MNIST-3/5	PGD	99.6%	0.536	4.00	0.493	0.375	0.041
MNIST	Baseline	98.9%	0.167	2.12	0.185	0.0775	0.045
MNIST	$L_{\alpha}$	99.1%	0.313	3.52	0.297	0.575	0.319
MNIST	PGD	99.3%	0.563	4.34	0.49	0.169	0.040
CIFAR-10	Baseline	84.1%	0.008	0.277	0.010	0.008	0.008
CIFAR-10	$L_{\alpha}$	81.0%	0.013	0.454	0.017	0.022	0.020
CIFAR-10	PGD	82.7%	0.023	0.774	0.028	0.0393	0.0305

Sphere Dataset

Method	Accuracy (%)	$\epsilon_{50\%}, \text{PGD}_{p=\infty}$	$\epsilon_{50\%}, \text{PGD}_{p=2}$	$\epsilon_{50\%},\text{BlBox}_{p=\infty}$	$\overline{\alpha_{\Delta x}}$	$\overline{\alpha_{x}}$
Baseline	100%	0.0055	0.096	0.0062	0.637	0.637
$L_{\alpha}$	100%	0.0077	0.133	0.0084	0.886	0.886
PGD	100%	0.0074	0.127	0.0082	0.852	0.852

Outputs

All the outputs of the model are saved in the runs folder, inside a folder for the specific experiment you are running (<experiment name>_<timestamp-id>). These are the files that are saved:

tensorboard/events.out.tfevents.<...>: tensorboard file for following the training losses and validation score in real-time and for checking their evolution through the epochs.
real_samples.png: a fixed batch of validation examples for which outputs will be printed
real_samples_gt.txt: the label for each of the fixed validation images
delta_x_gt.png: ground truth for $(\widehat{\Delta x})$ , when training the direct generator or the generated $(\widehat{\Delta x})$ , when training the classifier.
robust_<epoch>.png: graph of accuracy as a function of perturbation norm of attacks.
cosine_similarity_correct_val_<epoch>.png: histogram of cosine similarities between the gradient of the model with respect to the inputs and $(\widehat{\Delta x})$ (histogram of $(\overline{\alpha_{\Delta x}})$ )
adversarial_samples_val_attack<epoch>.png: examples of images attacked with the selected attack method.
adversarial_samples_gradient_val<epoch>.png: gradients of loss with respect to images, normalized to -1 to 1 range.
delta_x_samples_<epoch>.png: estimated residual to the closest example of the closest class, created when training the direct generator.
xprime_samples_<epoch>.png: generated closest example of the closest class, at the end of that epoch, when training the direct generator.
real_samplesxzinit<destination_class>.png: the generated closest example after the first 600 iterations of optimization to project into the manifold of the indirect generator, with the penalty to the z norm equals to 0.
real_samples<destination_class>.png: the generated closest example of the destination class, when performing optimization to project into the manifold of the indirect generator.
state_dict_g_best_epoch: checkpoint for the generator model for the epoch with the highest validation score.
state_dict_d_best_epoch: checkpoint for the classifier model for the epoch with the highest validation score.
log.txt: a way to check the configurations used for that run and check the losses and scores of the model in text format, without loading tensorboard.
command: command used to run the python script, including all the parser arguments.
csv_file.csv: table containing per-example statistics for the alignment from all classes as described in Table 6, Section 3.4 from the paper.

Credits for external code

The code included in src/cgan/ was cloned from https://github.com/ilyakava/BigGAN-PyTorch and modified to include more datasets. The code included in src/indirect_method/bg is copied from the same repository, but only includes the code necessary to run a loaded generator.
The code included in src/.../advertorch was copied from https://github.com/BorealisAI/advertorch/pull/74/files and https://github.com/BorealisAI/advertorch/blob/c18b5882b2c1eb2a3f650c8c9296b920e6635521/advertorch/attacks/spatial.py and slightly modified.
The code included in src/indirect_method/util_defense_GAN.py was inspired by https://raw.githubusercontent.com/sky4689524/DefenseGAN-Pytorch/master/util_defense_GAN.py

License

Most files of this project are licensed under the MIT License. Some of the files in this repository have code snippets originated from files licensed with MIT License. Files that are not licensed under the MIT License:

Files in src/indirect_method/advertorch/ and src/direct_method/advertorch/ are licensed under the GNU LESSER GENERAL PUBLIC LICENSE Version 3.

By: Ricardo Bigolin Lanfredi, ricbl@sci.utah.edu, ricbl.github.io.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
readme_images		readme_images
robustbench/data		robustbench/data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
plotstule.mplstyle		plotstule.mplstyle
requirements_direct.sh		requirements_direct.sh
requirements_indirect.sh		requirements_indirect.sh
running_commands.csv		running_commands.csv

License

ricbl/gradient-direction-of-robust-models

Folders and files

Latest commit

History

Repository files navigation

Gradient Direction of Robust Models

Setup

Usage

Results of pre-trained models

Direct Gradient Estimation (Section 2.3.1 in the paper)

Image Datasets

Sphere Dataset

Indirect Gradient Estimation (Section 2.3.2 in the paper)

Image Datasets

Robustness

Image Datasets

Sphere Dataset

Outputs

Credits for external code

License

About

Resources

License

Stars

Watchers

Forks

Languages