Skip to content

Code for the NIPS2017 competition on adversarial examples

Notifications You must be signed in to change notification settings

luizgh/adversarial_examples

Repository files navigation

Box-constrained attacks (adversarial examples) in Tensorflow

This repository contains the code for the targeted and non-targeted attacks used by team Placeholder (Luiz Gustavo Hafemann and Le Thanh Nguyen-Meidine for the NIPS 2017 adversarial attacks/defenses competition.

Our attacks were based on a formulation of the problem that minimizes the probability of the correct class (or maximize the probability of a target class) considering the distortion (L_infinity norm of the adversarial noise) as a hard constraint. For this we used two algorithms: L-BFGS with box-constraints and projected Stochastic Gradient Descent.

Setup

For the attacks we used python 2 (which was recommended for the competition), but the refactored version of the attacks in this main folder work with either Python 2 or Python 3.

Requirements:

  • numpy
  • tensorflow (tested with versions 1.2 and 1.4)
  • scipy (tested with version 0.19.1)

Usage

We are making three attacks available:

  • box_constrained_attack: Uses L-BFGS with box-constraints as optimizer
  • pgd_attack: Uses projected SGD (Stochastic Grandient Descent) as optimizer
  • step_pgd_attcK: Uses a mix of FGSM (Fast Gradient Sign Attack) and SGD. We found this to converge faster if there is a limit of only a few iterations (e.g. 10-15)

Example of usage:

pgd_attacker = pgd_attack.PGD_attack(model, 
                                     batch_shape, 
                                     max_epsilon=eps, 
                                     max_iter=max_iter, 
                                     targeted=False,
                                     initial_lr=1,
                                     lr_decay=0.99)

attack_img = pgd_attacker.generate(sess, imgs, pred, verbose=True)                                     

The parameters are explained in the docstring. For the method above:

  • model: Callable (function) that accepts an input tensor and return the model logits (unormalized log probs)
  • batch_shape: Input shapes (tuple). Usually: (batch_size, height, width, channels)
  • max_epsilon: Maximum L_inf norm for the adversarial example
  • max_iter: Maximum number of gradient descent iterations
  • targeted: Boolean: true for targeted attacks, false for non-targeted attacks
  • img_bounds: Tuple [min, max]: bounds of the image. Example: [0, 255] for a non-normalized image, [-1, 1] for inception models.
  • initial_lr: Initial Learning rate for the optimization
  • lr_decay: Learning rate decay (multiplied in the lr in each iteration)
  • rng: Random number generator

For a more comprehensive example, please check the provided ipython notebook

About

Code for the NIPS2017 competition on adversarial examples

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published