Skip to content

ricbl/gradient-direction-of-robust-models

Repository files navigation

Gradient Direction of Robust Models

This code reproduces part of the results presented in the paper Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent. This code implementation is done in Linux, using Python 3 and PyTorch.

The main objective of the paper is to quantitatively answer the question "To what direction the gradients of models with respect to its inputs align after robust training?". We propose to answer that question with the direction that connects the current input with the closest example in the support of the closest inaccurate class in decision space. To test if this direction is directly related to robustness, we propose a metric measuring the alignment between gradient and the proposed direction. We show that this alignment increases with robust training and that the proposed direction gets a closer alignment than another gradient alignment metric from the literature. Also, we show that increasing the alignment of the gradient with a penalty term on the loss increases robustness.

The COPD dataset used in the paper is private and is not available for outside investigators. The respective code is only provided for reference purposes.

Setup

  • To install all the needed libraries, you can use the requirements_indirect.sh and requirements_direct.sh files. They assumes you have conda or miniconda installed and creates conda environments called gdrm_indirect and gdrm_direct, with prerequisites installed. Activate the environment before running the code, using conda activate gdrm_indirect or conda activate gdrm_direct, for running experiments with the indirect method for estimating (Section 2.3.2 in the paper), or running experiments with the direct method for estimating (Section 2.3.1 in the paper), respectively.
  • Each folder inside the src/ folder has its own requirements.txt listing imported libraries and versions.

Usage

Check the file running_commands.csv for commands used for all training and validation done for the paper and results presented here.

  • All commands select the GPU indexed by 0, but you can change the argument gpus according to your needs.
  • For the test commands, replace the <timestamps-id> expression with the respective value of the training experiment folder.
  • For the test commands with the CIFAR-10 dataset, replace the <best_epoch> expression with the best epoch in terms of epsilon_0.5 considering values from epoch 33 to epoch 100.
  • For the ImageNet command, replace the <robustbench_model_name> expression with the desired model from the Linf eps=4/255 ImageNet RobustBench leaderboard.
  • You can run python -m src.indirect_method.train --help and python -m src.direct_method.train --help to see all available options for modifying the runs.
  • All commands should be run from the project base folder.
  • To check test scores, open the log.txt file inside the experiment folder (./runs/...).
  • The first time some datasets are used, H5 files are created for faster loading of datasets in subsequent runs, so the first run may take an unusually long time to start producing outputs.

Results of pre-trained models

Example pre-trained models are available at https://www.sci.utah.edu/~datasets/gradient-direction-of-robust-models/pretrained_models.zip. To get the numbers provided in the tables below, use the commands provided in running_commands.csv, replacing the respective --load_checkpoint_g= and --load_checkpoint_d= with the respective paths to the provided pre-trained models. For the MNIST-3/5, MNIST, and CIFAR-10 datasets, use the generator folder for training the models with the cosine alignment penalty, and the generator_reference folder for testing the alignment of the robust methods. For example, for generating the result generator alignments and images for the Squares dataset, use

python -m src.train --experiment=square_vrgan_test --gpus=0 --nepochs=1 --dataset_to_use=squares --skip_train=true --split_validation=test --vrgan_training=true --load_checkpoint_g=./pretrained_models/square/generator/state_dict_g_best_epoch

. As another example, for getting the images and black-box numbers for the method for the MNIST-3/5 dataset, use

python -m src.train --dataset_to_use=mnist --experiment=mnist_cosine_test_bbox --gpus=0 --split_validation=test --unet_downsamplings=2 --load_checkpoint_g=./pretrained_models/mnist35/generator_reference/state_dict_g_best_epoch --nepochs=1 --skip_train=true --epsilons_val_attack 0.02 0.04 0.06 0.1 0.14 0.2 0.4 0.6 0.8 1.0 1.2 1.4 --load_checkpoint_d=./pretrained_models/mnist35/cosine/state_dict_d_best_epoch --blackbox_attack=true

. More complete results for the method are given in the paper, including averaged results over five random seeds.

Direct Gradient Estimation (Section 2.3.1 in the paper)

This section shows results for estimating the vector connecting an input to its closest example of the opposite class in binary datasets.

Image Datasets

Dataset

Input image

                                          
Estimated vector to the closest point of the opposite class

                                          
Groundtruth for

                                          
Cosine similarity between and

                                          
Square drawing drawing drawing 0.869
MNIST-3/5 drawing drawing

Sphere Dataset

Class

              

              

              

              

              
0 1.0 0.31 0.3 1.29 1.3
1 1.3 0.34 0.3 1.00 1.0

Indirect Gradient Estimation (Section 2.3.2 in the paper)

This section shows results for indirectly estimating the vector connecting an input to its closest example in the support of another class.

Image Datasets

Dataset

Input image

                                          
Estimated vector to the closest point of the closest class

                                          
Groundtruth for

                                          
Cosine similarity between and

                                          
Square32 drawing drawing drawing 0.608
MNIST drawing drawing
CIFAR-10 drawing drawing

Robustness

Image Datasets

Dataset

Method

Input image

                                          
Gradients of loss with respect to inputs

                                          
Accuracy (%)

                              
Median robustness against PGD Attack

                              
Median robustness against PGD Attack

                              
Median robustness against Square Attack

                              
Average alignment between and

                              
Average alignment between and

                              
Square Baseline drawing drawing 100% 0.022 12.3 0.107 0.0173 0.0087
Square drawing drawing 100% 0.408 63.0 0.405 0.927 0.353
Square PGD drawing drawing 100% 0.479 62.1 0.323 0.198 0.179
MNIST-3/5 Baseline drawing drawing 99.7% 0.192 2.28 0.234 0.194 0.011
MNIST-3/5 drawing drawing 98.9% 0.347 3.46 0.341 0.609 0.255
MNIST-3/5 PGD drawing drawing 99.6% 0.536 4.00 0.493 0.375 0.041
MNIST Baseline drawing drawing 98.9% 0.167 2.12 0.185 0.0775 0.045
MNIST drawing drawing 99.1% 0.313 3.52 0.297 0.575 0.319
MNIST PGD drawing drawing 99.3% 0.563 4.34 0.49 0.169 0.040
CIFAR-10 Baseline drawing drawing 84.1% 0.008 0.277 0.010 0.008 0.008
CIFAR-10 drawing drawing 81.0% 0.013 0.454 0.017 0.022 0.020
CIFAR-10 PGD drawing drawing 82.7% 0.023 0.774 0.028 0.0393 0.0305

Sphere Dataset

Method

Accuracy (%)

                              


                              


                              


                              


                              


                              
Baseline 100% 0.0055 0.096 0.0062 0.637 0.637
100% 0.0077 0.133 0.0084 0.886 0.886
PGD 100% 0.0074 0.127 0.0082 0.852 0.852

Outputs

All the outputs of the model are saved in the runs folder, inside a folder for the specific experiment you are running (<experiment name>_<timestamp-id>). These are the files that are saved:

  • tensorboard/events.out.tfevents.<...>: tensorboard file for following the training losses and validation score in real-time and for checking their evolution through the epochs.
  • real_samples.png: a fixed batch of validation examples for which outputs will be printed
  • real_samples_gt.txt: the label for each of the fixed validation images
  • delta_x_gt.png: ground truth for , when training the direct generator or the generated , when training the classifier.
  • robust_<epoch>.png: graph of accuracy as a function of perturbation norm of attacks.
  • cosine_similarity_correct_val_<epoch>.png: histogram of cosine similarities between the gradient of the model with respect to the inputs and (histogram of )
  • adversarial_samples_val_attack<epoch>.png: examples of images attacked with the selected attack method.
  • adversarial_samples_gradient_val<epoch>.png: gradients of loss with respect to images, normalized to -1 to 1 range.
  • delta_x_samples_<epoch>.png: estimated residual to the closest example of the closest class, created when training the direct generator.
  • xprime_samples_<epoch>.png: generated closest example of the closest class, at the end of that epoch, when training the direct generator.
  • real_samplesxzinit<destination_class>.png: the generated closest example after the first 600 iterations of optimization to project into the manifold of the indirect generator, with the penalty to the z norm equals to 0.
  • real_samples<destination_class>.png: the generated closest example of the destination class, when performing optimization to project into the manifold of the indirect generator.
  • state_dict_g_best_epoch: checkpoint for the generator model for the epoch with the highest validation score.
  • state_dict_d_best_epoch: checkpoint for the classifier model for the epoch with the highest validation score.
  • log.txt: a way to check the configurations used for that run and check the losses and scores of the model in text format, without loading tensorboard.
  • command: command used to run the python script, including all the parser arguments.
  • csv_file.csv: table containing per-example statistics for the alignment from all classes as described in Table 6, Section 3.4 from the paper.

Credits for external code

License

Most files of this project are licensed under the MIT License. Some of the files in this repository have code snippets originated from files licensed with MIT License. Files that are not licensed under the MIT License:

  • Files in src/indirect_method/advertorch/ and src/direct_method/advertorch/ are licensed under the GNU LESSER GENERAL PUBLIC LICENSE Version 3.

By: Ricardo Bigolin Lanfredi, ricbl@sci.utah.edu, ricbl.github.io.

About

Code for the experiments from the paper "Quantifying the Preferential Direction of the Model Gradient in Adversarial Training With Projected Gradient Descent"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published