Code for the Generative Adversarial Examples paper
Switch branches/tags
Nothing to show
Clone or download
Latest commit 828650e Nov 9, 2018
Type Name Latest commit message Commit time
Failed to load latest commit information.
assets init Sep 10, 2018
models init Sep 10, 2018
mturk_websites init Sep 10, 2018
.gitignore init Sep 10, 2018
LICENSE Create LICENSE Sep 10, 2018 add dataset links Nov 9, 2018 init Sep 10, 2018 init Sep 10, 2018 init Sep 10, 2018 init Sep 10, 2018 init Sep 10, 2018 init Sep 10, 2018

Constructing Unrestricted Adversarial Examples with Generative Models

This repo contains necessary code for reproducing main results in the paper Constructing Unrestricted Adversarial Examples with Generative Models, NIPS 2018, Montréal, Canada.

by Yang Song, Rui Shu, Nate Kushman and Stefano Ermon, Stanford AI Lab.

We propose Generative Adversarial Examples, a new kind of adversarial examples to machine learning systems. Different from traditional adversarial examples that are crafted by adding norm-bounded perturbations to clean images, generative adversarial examples are realistic images that are synthesized entirely from scratch, and not restricted to small norm-balls. This new attack demonstrates the danger of a stronger threat model, where traditional defense methods for perturbation-based adversarial examples fail.


Here are links to the datasets used in our experiments:

Running Experiments

Training AC-GANs

In order to do generative adversarial attack, we first need a good conditional generative model so that we can search on the manifold of realistic images to find the adversarial ones. You can use to do this. For example, the following command

CUDA_VISIBLE_DEVICES=0 python --dataset mnist --checkpoint_dir checkpoints/

will train an AC-GAN on the MNIST dataset with GPU #0 and output the weight files to the checkpoints/ directory.

Run python --help to see more available argument options.

Generative Adversarial Attack

After the AC-GAN is trained, you can use to do targeted / untargeted attack. You can also use to evaluate the accuracy and PGD-robustness of a trained neural network classifier. For example, the following command

CUDA_VISIBLE_DEVICES=0 python --mode targeted_attack --dataset mnist --classifier zico --source 0 --target 1

attacks the provable defense method from Kolter & Wong, 2018 on the MNIST dataset, with the source class being 0 and target class being 1.

Run python --help to view more argument options. For hyperparameters such as --noise, --lambda1, --lambda2, --eps, --z_eps, --lr, and --n_iters (in that order), please refer to Table. 4 in the Appendix of our paper.

Evaluating Generative Adversarial Examples

In the paper, we use Amazon Mechanical Turk to evaluate whether our generative adversarial examples are legitimate or not. We have provided html files for the labelling interface in folder amt_websites.


Perturbation-based adversarial examples (top row) VS generative adversarial examples (bottom-row):


Targeted generative adversarial examples against robust classifiers on MNIST (Green borders denote legitimate generative adversarial examples while red borders denote illegimate ones. The tiny white text at the top-left corder of a red image denotes the label given by the annotators. )


We also have samples for SVHN dataset:


Finally here are the results for CelebA



If you find the idea or code useful for your research, please consider citing our paper:

  title={Generative Adversarial Examples},
  author={Song, Yang and Shu, Rui and Kushman, Nate and Ermon, Stefano},
  booktitle = {Advances in Neural Information Processing Systems (NIPS)},
  title = {Constructing Unrestricted Adversarial Examples with Generative Models},
  year = {2018},