Skip to content
/ RayS Public
forked from uclaml/RayS

RayS: A Ray Searching Method for Hard-label Adversarial Attack (KDD2020)

License

Notifications You must be signed in to change notification settings

kubic71/RayS

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Google Vision attacks

This fork tries to apply RayS untargeted L_inf norm hard-label attack to commercial Google Vision API. Bellow are the results of the experiments done so far.

Google vision API doesn't classify images into a set number of categories. Therefore we need our own binary decision function which defines what is/isn't a valid adversarial example.

Another problem is that one concept may be represented by multiple labels. For example labels "Cat", "Small to medium-sized cats", "Whiskers" and "felidae" are all somewhat close to the category "Cat". Idealy we would like to eliminate all those similar labels from the classification results.

Because RayS is inherently binary search algorithm, the definition of advesarial example may reflect how hard it is to find one.

Various boundary decision functions were experimented with:

  • strict untargeted attack - any mention of any label from object label-set in any label returned by the API is considered a fail
  • top5 attack - no labels coresponding to the original concept may appear in top5 results
  • top1 attack - Only the first label is taken into account.

Cat attack

Original image of the cat

Hard attack

  • All labels containing "cat", "felidae" or "whiskers" were forbidden.
  • 3200 queries, L_inf = 0.196

Bit easier attack

  • All labels containing "cat" were forbidden.
  • 3200 queries, L_inf = 0.1757

Cat top5

  • word "cat" forbidden in top5 labels
  • 1600 queries, L_inf = 0.121

Cat top1

  • word "cat" forbidden in top1 label
  • 1600 queries, L_inf = 0.0666

Shark attack

Original image of the shark

Shark top5

  • no mention of "Shark", "Fin" "Water", "Fish", "Carcharhiniformes", "Lamnidae", "Lamniformes" allowed in top5 labels
  • 1600 queries, L_inf = 0.141

Shark top1

  • no mention of "Shark", "Fin" "Water", "Fish", "Carcharhiniformes", "Lamnidae", "Lamniformes" allowed in top1 label
  • 1600 queries, L_inf = 0.0705

RayS: A Ray Searching Method for Hard-label Adversarial Attack (KDD2020)

"RayS: A Ray Searching Method for Hard-label Adversarial Attack"
Jinghui Chen, Quanquan Gu
https://arxiv.org/abs/2006.12792

Examples of successful untargeted attack

This repository contains our PyTorch implementation of RayS: A Ray Searching Method for Hard-label Adversarial Attack in the paper RayS: A Ray Searching Method for Hard-label Adversarial Attack (accepted by KDD 2020).

What is RayS

RayS is a hard-label adversarial attack which only requires the target model's hard-label output (prediction label).

It is gradient-free, hyper-parameter free, and is also independent of adversarial losses such as CrossEntropy or C&W.

Therefore, RayS can be used as a good sanity check for possible "falsely robust" models (models that may overfit to certain types of gradient-based attacks and adversarial losses).

Average Decision Boundary Distance (ADBD)

RayS also proposed a new model robustness metric: ADBD (average decision boundary distance), which reflects examples' average distance to their closest decision boundary.

Model Robustness: ADBD Leaderboard

We tested the robustness of recently proposed robust models which are trained on the CIFAR-10 dataset with the maximum L_inf norm perturbation strength epsilon=0.031 (8/255). The robustness is evaluated on the entire CIFAR-10 testset (10000 examples).

Note:

  • Ranking is based on the ADBD (average decision boundary distance) metric under RayS attack with the default query limit set as 40000. Reducing the query limit will accelerate the process but may lead to inaccurate ADBD value. For fast checking purpose, we recommend evaluating on subset of CIFAR-10 testset (e.g., 1000 examples).
  • * denotes model using extra data for training.
  • Robust Acc (RayS) represents robust accuracy under RayS attack for L_inf norm perturbation strength epsilon=0.031 (8/255). For truly robust models, this value could be larger than the reported value (using white-box attacks) due to the hard-label limitation. For the current best robust accuracy evaluation, please refers to AutoAttack, which uses an ensemble of four white-box/black-box attacks.
  • ADBD represents our proposed Average Decision Boundary Distance metric, which is independent to the perturbation strength epsilon. It reflects the overall model robustness through the lens of decision boundary distance. ADBD can be served as a complement to the traditional robust accuracy metric. Furthermore, ADBD only depends on hard-label output and can be adopted for cases where back-propgation or even soft-labels are not available.
Method Natural Acc Robust Acc
(Reported)
Robust Acc
(RayS)
ADBD
WAR
(Wu et al., 2020)
*
85.6 59.8 63.2 0.0480
RST
(Carmon et al., 2019)
*
89.7 62.5 64.6 0.0465
HYDRA
(Sehwag et al., 2020)
*
89.0 57.2 62.1 0.0450
MART
(Wang et al., 2020)
*
87.5 65.0 62.2 0.0439
UAT++
(Alayrac et al., 2019)
*
86.5 56.3 62.1 0.0426
Pretraining
(Hendrycks et al., 2019)
*
87.1 57.4 60.1 0.0419
Robust-overfitting
(Rice et al., 2020)
85.3 58.0 58.6 0.0404
TRADES
(Zhang et al., 2019b)
85.4 56.4 57.3 0.0403
Backward Smoothing
(Chen et al., 2020)
85.3 54.9 55.1 0.0403
Adversarial Training (retrained)
(Madry et al., 2018)
87.4 50.6 54.0 0.0377
MMA
(Ding et al., 2020)
84.4 47.2 47.7 0.0345
Adversarial Training (original)
(Madry et al., 2018)
87.1 47.0 50.7 0.0344
Fast Adversarial Training
(Wong et al., 2020)
83.8 46.1 50.1 0.0334
Adv-Interp
(Zhang & Xu, 2020)
91.0 68.7 46.9 0.0305
Feature-Scatter
(Zhang & Wang, 2019)
91.3 60.6 44.5 0.0301
SENSE
(Kim & Wang, 2020)
91.9 57.2 43.9 0.0288

Please contact us if you want to add your model to the leaderboard.

How to use RayS to evaluate your model robustness:

Prerequisites:

  • Python
  • Numpy
  • CUDA

PyTorch models

Import RayS attack by

from general_torch_model import GeneralTorchModel
torch_model = GeneralTorchModel(model, n_class=10, im_mean=None, im_std=None)

from RayS import RayS
attack = RayS(torch_model, epsilon=args.epsilon)

where:

  • torch_model is the PyTorch model under GeneralTorchModel warpper; For models using transformed images (exceed the range of [0,1]), simply set im_mean=[0.5, 0.5, 0.5] and im_std=[0.5, 0.5, 0.5] for instance,
  • epsilon is the maximum adversarial perturbation strength.

To actually run RayS attack, use

x_adv, queries, adbd, succ = attack(data, label, query_limit)

it returns:

  • x_adv: the adversarial examples found by RayS,
  • queries: the number of queries used for finding the adversarial examples,
  • adbd: the average decision boundary distance for each example,
  • succ: indicate whether each example being successfully attacked.
  • Sample usage on attacking a robust model:
  -  python3 attack_robust.py --dataset rob_cifar_trades --query 40000 --batch 1000  --epsilon 0.031
  • You can also use --num 1000 argument to limit the number of examples to be attacked as 1000. Default num is set as 10000 (the whole CIFAR10 testset).

TensorFlow models

To evaluate TensorFlow models with RayS attack:

from general_tf_model import GeneralTFModel 
tf_model = GeneralTFModel(model.logits, model.x_input, sess, n_class=10, im_mean=None, im_std=None)

from RayS import RayS
attack = RayS(tf_model, epsilon=args.epsilon)

where:

  • model.logits: logits tensor return by the Tensorflow model,
  • model.x_input: placeholder for model input (NHWC format),
  • sess: TF session .

The remaining part is the same as evaluating PyTorch models.

Reproduce experiments in the paper:

  • Run attacks on a naturally trained model (Inception):
  -  python3 attack_natural.py --dataset inception --epsilon 0.05
  • Run attacks on a naturally trained model (Resnet):
  -  python3 attack_natural.py --dataset resnet --epsilon 0.05
  • Run attacks on a naturally trained model (Cifar):
  - python3 attack_natural.py --dataset cifar --epsilon 0.031
  • Run attacks on a naturally trained model (MNIST):
  - python3 attack_natural.py --dataset mnist --epsilon 0.3

Citation

Please check our paper for technical details and full results.

@inproceedings{chen2020rays,
  title={RayS: A Ray Searching Method for Hard-label Adversarial Attack},
  author={Chen, Jinghui and Gu, Quanquan},
  booktitle={Proceedings of the 26rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
  year={2020}
}

Contact

If you have any question regarding RayS attack or the ADBD leaderboard above, please contact jinghuic@ucla.edu, enjoy!

About

RayS: A Ray Searching Method for Hard-label Adversarial Attack (KDD2020)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%