A Gym environment for evaluating and enabling new research into attacking and defending neural networks with adversarial perturbations
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Type Name Latest commit message Commit time
Failed to load latest commit information.



A PyTorch-integrated Gym environment for enabling and evaluating new research into attacking and defending neural networks with adversarial perturbations. While most research into adversarial perturbations has focused on gradient, decision boundary, or transferability methods, this environment generalizes the adversarial attack problem as a Markov decision process. This allows for methods from reinforcement learning to be applied in a relatively straightforward manner, a possibility that has only been rarely suggested or investigated in the ML security literature. Of course, there are a number of technical challenges with trying to do this with RL, but it's worth investigating. In the spirit of Gym, no assumptions are made about the structure of your agent; in particular, adversaries can be either MDP-aware or MDP-agnostic. This means that non-RL attacks can be evaluated within the same framework.

Note: Still in development, but stable branch should be self-contained.




  • Build out core functionality
  • Minimally test environment with pytorch-classification
  • Generalize to non-CIFAR Datasets
  • Extend to custom samplers to expose environment dynamics
  • Extend for different reward functions
  • Refactor to incorporate reward Wrappers
  • Generalize confidence calculation to other activations
  • Implement strict argument for strict epsilon ball enforcement
  • Test basic functionality of Untargeted and StaticTargeted wrappers
  • Extend TensorBox to use numpy low/high API
  • Formalize unit testing
  • Update with Gym's new closer registry
  • Test basic functionality of DynamicTargeted and DefendMode wrappers
  • Spec out and integrate CI platform
  • Comment/document reward Wrappers (might need to do some more)
  • Test new features
  • Add BadNets functionality
  • Make UniformSampler memory-efficient
  • Minimally test environment with torchvision models (esp. non-CIFAR datasets)
  • Spec out rewriting for Foolbox integration/decide if it's worth it